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Abstract. We study decision timing problems on finite horizon with Poissonian information ar- 
rivals. In our model, a decision maker wishes to optimally time her action in order to maximize 
her expected reward. The reward depends on an unobservable Markovian environment, and in- 
formation about the environment is collected through a (compound) Poisson observation process. 
Examples of such systems arise in investment timing, reliability theory, Bayesian regime detection 
and technology adoption models. We solve the problem by studying an optimal stopping problem 
for a piecewise-deterministic process which gives the posterior likelihoods of the unobservable en- 
vironment. Our method lends itself to simple numerical implementation and we present several 
illustrative numerical examples. 



1. Introduction 

Decision timing under uncertainty is one of the fundamental problems in Operations Research. 
In a typical setting, an economic agent (called the decision-maker or DM) has a set of possible 
actions A where each action has a (random) reward associated with it. The objective of the DM 
is to select a single action and time it so as to maximize her expected reward. More precisely, the 
DM picks a stopping time r and action k from the set A. at r. The reward H that DM receives is a 
function of the pair (r, k), as well as of some stochastic state variable Y. In classical examples (e.g. 
investment timing, American option pricing, natural resource management, etc.), Y is an observable 
stochastic process (e.g. asset prices, market demand etc.), and the DM's objective is a standard 
optimal stopping problem. 

More complicated stopping problems involving unobserved system states have also been consid- 
ered in the literature; see, for example, [2], [21], [31], [30], [24], [38], [34], [18], [13], [11]. Such 
models are especially natural when one wishes to capture the inherent conflict between gathering 
of information (which makes waiting valuable) and the time- value of money (which makes waiting 
costly). Indeed, most realistic settings involve a DM who is only partially aware of the environment 
and must collect data before making a decision. In a multi-period setting, it is natural to capture 
this uncertainty in the environment through an unobservable stochastic process M = {Mj}t>o, 
where Mt represents the state of the world at time t. The DM starts with an initial guess about 
M, collects information via relevant news, and updates her beliefs. At the time of decision she then 
receives a reward that depends on the present environment, H = H(r, k,M T ). 

In such problems, a common approach is to postulate that the process M is a partially observable 
Markov (decision) process (POMDP), in which case we have a hidden Markov model (HMM). We 
refer the reader to [5], [14] for a comprehensive treatment of discrete-time models and to [4], [27] 
for continuous-time models and applications. 

In both discrete- and continuous-time models the analysis separates the sub-problems of estima- 
tion (filtering of M) and control. The second "control" step requires re-formulating the problem 
under an equivalent fully observable system, where the conditional distributions/probabilities of 
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the process M constitute the new state variables. In discrete-time, the value function is typically 
a fixed point of the corresponding dynamic programming (DP) operator, and can be obtained via 
a recursive application of this operator; see, for example, the models and algorithms in [5], [28] 
On the other hand, continuous-time formulations allow more sophisticated models, and the dy- 
namic programming principle generally manifests itself in the form of a (partial) differential (delay) 
equation; see [17], [26], [4], [33, Chapter 6] and the references therein for various examples. 

The major distinction between discrete- and continuous-time models comes from the nature of 
the control and the observations; that is, is the system asynchronous and observations/stopping 
can occur anytime, or are there fixed time epochs when new information is processed and stopping 
decisions are made. A similar distinction exists within continuous-time models. If news (such as 
changes in asset prices) arrive in infinitesimal amounts, then it is intuitive to have a continuum of 
information, which is typically captured by the filtration of an observed diffusion process. However, 
in many instances, a more realistic representation is to use "discrete" information amounts. Cor- 
porate developments, engineering failures, insurance claims, and economic surveys are all discrete 
events and the corresponding news arrive in "chunks" . Note that discreteness of information is dis- 
tinct from the discreteness of time. The model is still in continuous-time, since the events may take 
place at any instance. However, the event itself carries a strictly positive amount of information. 
Moreover, "no news" is still informative and affects the beliefs of the DM. 

Mathematically, discrete information in continuous-time may be represented by the filtration of 
an observed marked point process. In such a model, the instantaneous arrival intensity and the 
distribution of the marks of the point process typically depend on the current state of the process 
M. That is, the observable point process encodes information about the hidden environment M 
via its arrival times and/or marks. Filtering with continuous-time point process observations has 
been considered in [6, 1, 15], and it is known that the dynamics of the conditional probabilities of 
M are of the piecewise deterministic process (PDP) type. In other words, the DM beliefs evolve 
deterministically between arrivals of new information, and experience random jumps at event times. 
From the control perspective, various aspects of optimal stopping of PDP's have been studied by 
[26], [20] and [7]. 

In this paper, we study a class of finite-horizon decision-making problems within the PDP frame- 
work by considering a general regime-switching model with Poisson information arrivals. Poissonian 
information allows us to capture the discreteness of news while maintaining a rich framework for 
the dependence of the observable X on the unobservable state of M, which can manifest itself 
both in arrival rate and mark distribution effects. In this context, our main contribution is the 
full characterization of the value function and optimal policy of the DM, with a direct proof of the 
dynamic programming principle and characterization of the optimal and e-optimal policies. Our 
approach also yields a numerical algorithm that can be readily implemented (see Section 6 for ex- 
amples). Within the PDP framework, related problems have been considered by [24] in connection 
with system reliability studies, [23] and [34] in the context of insurance premium re-pricing and 
[32], [19], [3], [12] for classical Poisson disorder and regime detection problems. 

Our model provides a non-trivial generalization of previous analysis of decision making under 
Poissonian information structures. More precisely, we extend existing literature in three directions. 
First, we consider a general continuous-time finite-state Markov chain for the environment variable 
M (without any assumptions on the transition rates), and impose no restriction on the arrival 
rate and mark distribution of the observed compound Poisson process X. The latter allows us 
to model any setting where the DM also gets information via the size/type of each event besides 
the interarrival epochs. Second, we consider a general discount/cost structure, that can be used 
to encode a variety of economic objectives. Finally, we work in the context of finite horizon, 
where value functions are time-inhomogeneous. This is a more realistic setting since a practicing 
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DM typically has a well-defined "window" for making their decision. The introduction of time- 
to-maturity as a state variable makes the numerical computation more challenging and leads to 
appearance of new effects that are not possible with stationary models. At the same time, our 
model allows a natural interpolation from finite to infinite horizon; see Section 4.4. 

Before concluding our discussion here, let us mention that the choice of "discrete-time model" 
versus "continuous-time model with discrete information" will be made according to the preferences 
of the modeler, as well as the nature of the problem. Accordingly, similar applications may invite 
different modeling approaches; for instance, the machine reliability problem discussed in Section 1.1 
below was studied both in a discrete-time setting by [37], a continuous-time setting by [24] and 
even a hybrid continuous-time model with discrete-epoch observations in [29]. In this context, if 
the machine/production system is subject to major breakdowns, then continuous monitoring may 
be more desirable. In other cases, end-of-day inspections may be more than enough to restore the 
profitability of operations. While the aforementioned formulations are superficially similar (and in 
some specific cases even equivalent, see [16]), the respective solution methods utilize quite different 
tools. The solution of discrete time models generally relies on the Smallwood-Sondik property [36] 
that shows that with finite state, observation, and action spaces the value function is piecewise linear 
and convex. In continuous-time this property no longer holds, and the smoothness of the value 
function must be independently established. Also in discrete-time models decisions and controls 
are intrinsically paired with observations. In contrast, in the models considered here, the control 
may take place both at event time or between events, which is an important qualitative distinction. 

1.1. A catalogue of sample problems. Since the framework studied throughout the paper is 
general, let us first provide a number of motivating examples illustrating the applications in various 
settings. 

Profit Maximization with Information Cost. Let us consider an insurance company which is 
planning to launch a new policy/product to its clients. The frequency of corresponding insurance 
claims and the severity of claim sizes are not known precisely. Rather, they depend on the current 
quality of the insurance portfolio, represented by a Markov process M = {Mt}t>o taking values on 
some space E = {1, . . . , n}. Once the policy is launched, it yields a random payoff that depends 
on the current state of M only. To model this, we say that when M is at state i G E at the 
launch-time, the random payoff is given by an independent random variable <E>j with some finite 
mean /Xj = E[$,]. 

Information about M is obtained through the filed claims process X = {Xt}t>o received by the 
firm. The cumulative claim process has the form Xt = ^,f=i Yj f° r t > 0. Here Nt is the total 
number of claims up to time t, and Yj is the size for the j'th claim for j G N. The process N 
is a simple Poisson process with intensity Aj whenever M is at state i G E. Moreover, if a claim 
is known to occur when M is at state i, the claim size is an independent random variable with 
distribution Vi. 

At any time prior to some terminal time T < oo, the company may launch the product or 
permanently abandon it. Alternatively, it can delay this decision to obtain more information on 
M, and to increase the likelihood of catching M at a favorable state. However, waiting for additional 
information costs c < per unit time. Therefore, the company must decide how long it observes 
X prior to a decision, and what decision (launch vs. quit) should be taken at that time. 

Let r <T denote the decision time, and let the random variable d G {0, 1} indicate whether the 
product is released or abandoned. That is, on the event {d = 1} the company launches the product, 
and on {d = 0} it quits. Clearly, the time t should be determined based on the observations from 
the claim process X, and the choice of action d should be determined solely by the information 
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generated by X until r. Then, the objective of the company is to compute 



(1.1) 



supE* 



e- pt cdt + e- pT l 



{d=i} 



Pi ■ l{M T =i} 



over all such pairs (r, d). In (1.1), p > is a given discount rate used by the company in reference 
to future revenues, and 7? = (ttj, . . . , 7r n ) = (P(Mo = i), ■ ■ ■ ,P(Mo = n) ) denote the initial beliefs 
of the company about the state of M at t = 0. 

A related problem has been considered on infinite horizon by [34] who maximizes future risk re- 
serves of the insurance company where at the time r the company will re-calculate its premiums. We 
also refer the reader to [13], and [39] for recent work on timing project commitment/abandonment 
in continuous and discrete time respectively. 

Bayesian Regime Detection. In this problem, a compound Poisson process X = {Xt}t>o is 
observed starting from t = 0. The arrival rate A and mark distribution v of X are not known 
precisely. Rather they depend on the static regime of the Markov process M with n absorbing 
states (i.e., M t = Mq for all t > 0). Each state corresponds to the realization of one of the n simple 
hypotheses 



(1.2) 



Hi : (A, v) = (Ai,^i), 



H n : = (X n ,v n ), 



with given prior likelihoods 7Tj, for i = 1, . . . , n. The objective of the DM is to recognize the current 
regime as quickly as possible, with minimal probability of wrong decision. 

In earlier work on this problem, the trade-off between observing and stopping is generally modeled 
via the Bayes risk 



(1.3) 



E 71 



T + 



E 

k,i=l 



Pk,il{d=k,Mo=i} 



where r is the decision time, d G {1, . . . ,n} represents the hypothesis selected and pu,i > is the 
cost of selecting the wrong hypothesis Hk when the correct one is Hi. The DM then needs to 
minimize (1.3) and find a pair (r, d), if one exists, that attains this infimum. 

The infinite horizon version of (1.3) was solved for the first time by [32] for a simple Poisson 
process with n = 2. Later, [19] provided the solution (again with n = 2), where the jump size 
is exponentially distributed under each hypothesis, with the mean of the exponential distribution 
the same as the proposed arrival rate. The solution for any jump distribution and for neN was 
recently provided by [12]. Our model in this paper can be viewed as the finite horizon version of 
that problem, where a decision must be made before a terminal time T < oo. 

Optimal Replacement Time of a Reliability System. [24] consider an optimal stopping prob- 
lem in reliability with a partially observed Poisson process. The problem is to find when to discard 
or replace a machine/production-system whose production quality deteriorates over time due to 
the usual wear-and-tear. The status of the machine is modeled with a finite state Markov process 
M. The process moves from good states to bad states over time. Eventually it ends in the n'th 
absorbing state which represents an unacceptable quality level. 

The DM observes the failure times a\ , (72, . . . (the failures can also be interpreted as defective 
items in the context of a machine); it is assumed that the corresponding "arrivals" form a Poisson 
process whose intensity is Xi when the current state of the process M is i G E = {l,...,n}. 
Running the system in state i yields a net payoff a G M per unit time. A high a indicates that 
the machine is profitable, while a negative a, including the assumed c n < 0, means that the low 
quality outweighs the benefits. At any time the DM can stop running the machine and replace it, 
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with a terminal cost of pi if the process M happens to be in state i G E at that time. [24] then 
solve the problem of maximizing 



over all random time r's (whose value is determined by the history generated by the arrival process) 
and under certain assumptions on the arrival rates Aj's, the infinitesimal generator of M, and cost 
parameters Cj,/Vs. Related models have appeared in [29], and [37] and go all the way to classical 
POMDP work by [36]. In this paper, we consider that problem without any parameter assumptions 
and with the additional finite horizon constraint r <T. 

1.2. Problem description: a unifying framework. In the examples above, a DM observes 
a compound Poisson process X with arrival rate A, and mark/jump distribution v. The local 
characteristics (A, v) of X are determined by the current state of an unobservable finite-state Markov 
process M. 

At any time r less than some T < oo, the DM can stop and select an action k from the set 
A = {1, . . . , a}. If action k G A is taken, this yields a terminal reward/payoff of 



as a function of the unobservable state of M. Here, p^i is a given finite (not necessarily positive) 
number. One can also interpret fx^i as the expected value of an independent random variable 
representing the uncertain payoff of taking action k when Mt = %■ Also note that if there is a 
time-lag between the decision and its realization, and if this delay is independent, then p^i can be 
assumed to be the expected discounted value of this payoff. 

The DM may alternatively delay her decision and continue to observe the process X in order to 
collect more information, or in order to stop later when M appears to be in a better state. Delaying 
the decision carries associated costs (rewards) due to the cost of observation or lost opportunity 
(or operating revenues). We allow these terms to depend on M and we assume that an amount 
with present value 



is accumulated until the decision time r. Here p > is the discount factor, and Cj is the instan- 
taneous cost or revenue of running the system when M is at state i G E. We allow p to be zero. 
This makes the formulation suitable for non-financial application where the quality of the decision 
is more important than its timing. 

In this setup, the objective of the DM is to find an admissible strategy that will maximize her total 
expected reward and resolve the trade-off between exploring (getting more observations) and ex- 
ploiting (engaging in an action). An admissible strategy is a pair (r, d), where r < T is the decision 
time and d G A is the action selected at this time. Since the DM collects information from observing 
X, the value of r should be determined by the information generated by X, namely r must be a stop- 
ping time of the filtration T x of X. Also, the decision variable d should be measurable with respect 
to the information T x revealed by X until r. Let fr = (iri, . . . , ir n ) = (F(Mq = 1), . . . , F(Mq = n)) 
be the initial (prior) beliefs of the DM about M and ¥ n the corresponding conditional probability 



(1.4) 
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law. Then the objective of the DM is to compute 
(1.5) 



U(T, vf) = sup E w 

t<T, daT? 



J e-P f I Cil {Mt =i} I dt + e-P T hd=k} I Yl • 1 



l {M T =i} 



and, if it exists, find an admissible pair (r, d) attaining this value. 

In Section 2 below we describe the formal setting of our model and show that the problem in 
(1.5) is equivalent to an optimal stopping problem in terms of the conditional probability process, 
which is a piecewise deterministic process. Section 3 describes how the value function of this 
stopping problem can be computed via a sequential procedure. The results of Section 3 are used 
in Section 4 in order to identify an optimal strategy and describe its properties. Following this, 
Section 5 explores alternative objective functions that can be employed in our framework. Finally, 
in Section 6 we give numerical examples illustrating our results. Most of the proofs are delegated 
to the Appendices at the end. 

2. Problem Statement 

2.1. Model. Let (£l,T~L,F) be a probability space hosting a continuous-time Markov process M 
taking values on E = {l,...,n}, for n G N, and with infinitesimal generator Q = (qij)ij^E- 
Also, we have a collection of independent compound Poisson processes X^\...,X^ with local 
parameters (Ai, vi), . . . , (A n , v n ) respectively. In terms of these independent processes, we define 
the observation process 

(2-1) X t ±X Q + [ Y. l iMs=i}dXf\ t>0, 

which is a Markov- modulated Poisson process, also called a Cox process (see [8]). In the remainder, 
we let o"o, o~x, . . . denote the arrival times of the process X: 

a m = inf{t > a m _i : X t / X t _}, m > 1, with a = 0, 

and the variables Yi, Y2, . . . denote Revalued marks observed at these arrival times: 

Y m = X am - X CTm _, m > 1. 

Finally, to compute relative likelihoods of different marks, we introduce the total measure v defined 
as v = v\ + . . . + v n: and we let /,(•) be the density of vi with respect to v. 

2.2. Conditional probability process. For a point in D = {tt G M™ : tt\ + . . . + 7r n = 1}, let 

P"" denote the probability measure (with the expectation operator E 71 ") under which M has initial 
distribution tt. Moreover, let F = {J~t C }t>o be the filtration of the process X in (2.1). With this 
notation, we define the -D-valued conditional probability process lit = (ul 1] , . . . M n) ) such that 

(2.2) nf } = F*{M t = i\T t x }, for i G E, and t > 0. 

The process II is clearly adapted to F, and each component gives the conditional probability that the 
current state of M is {i} given the information generated by X until the current time t. Moreover, 
using standard arguments as in [35, pp. 166-167], and [12, Proof of Proposition 2.1], it can be 
shown that the problem in (1.5) is equivalent to a fully observed optimal stopping problem with 
the process IT as the new hyperstate. More precisely, the value function U in (1.5) can be written 
as 



(2.3) U(T, tt) = V(T, 7?) = sup E" 

T<T 



[ e~ pt C(U t )dt + e- pT H(n T ) 
Jo 
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in terms of the functions 

(2.4) C(tT) = > Cj7Tj and H(tt) = max.H k (Tr), where H k (7r) = 7 l^ki^i- 

keA ^-^ 

i&E i&E 

If there is a stopping time r* attaining the supremum in (2.3), then the admissible strategy 
(r*,d(r*)) is an optimal rule for the problem in (1.5) if we define 



(2.5) 



d(r) G argmaxH k (U T ) 

keA 



2.3. Sample paths of it. Let us take a sample path of the observations process X, in which 
m-many arrivals are observed on [0,t]. Let (tk)k<m denote those arrival times. If we know that the 
process M stays at the state {i} without any transition, then the (conditional) likelihood of this 
path would be written as ¥ n {a k G dt k ,Y k G dy k ; k < m \ M s = i, s < t} = 

m m 

[Xie-^dh] ■ ■ ■ [Xie-^-^dt^e-^-^ \[[fi{y k )v{dy k )] = e"*** ]J Xidt k ■ fi(y k )u(dy k ). 



k=i 



k=l 



By construction, the observation process X has independent increments conditioned on M 
{Mt}t>o- Therefore, we have 



(2.6) l {Mt =i} ■ {ffi G dt h Yi £d yi ;i<m 

rt n 



M s :s<t 



1 



{M t =i} • exp I - / ^2 \i\ {Mtk=i} ds J • Yl I X] 1 {A^ fc =i}[ A i dtfc " fi(yk)^(dyk)} 
\ Jo j=i / fc=i VjeB . 



By taking the expectations of the expressions above, we obtain the unconditional likelihoods, in 
terms of which we give an explicit representation for the process II in Lemma 2.1 below. 



Lemma 2.1. For i G E, let us define 

(2.7) LJ(t, m : (t k , y k ), k <m) =W 

where 
(2.8) 



k =i 



I ( t ) - iT] X ^{M s =i} ds and £(t, y) = V] l {Mt=j} \j ■ fj(y). 
Jo i=i jeE 

Also, let L^it, m : (t k , y k ), k < m) = X^gE ^fit, m : (t k , y k ), k < m). Then we have 



(2 Q) n (0 = Lf(t,N t :(a k ,Y k ),k<N t ) 
1 ' ' t L*(t,N t : (a k ,Y k ),k<N t ) 

F^-a.s., for all t>0, and for i G E. 



Lf(t,m : (t k ,y k ),k < m) 
L^{t,m : {t k ,y k ),k < m) 



m=N t ; (tk=Vk,yk=Yk)k<r 



Lemma 2.1 indicates that the conditional probability of Mt being in state % is simply the (uncon- 
ditional) relative likelihood of the observed path until t on the event {Mt = i}. Using the explicit 
form in (2.9), we describe the behavior of the sample paths of II in Remark 2.1 below. 

Remark 2.1. The process II has piecewise- deterministic sample paths: between two arrival times 
of X, it moves deterministically, and at an arrival time, it jumps from one point to another de- 
pending on the observed mark size (see Figure 2.3). In precise terms, the sample paths have the 
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(1,0,0) (0,1,0) (1,0,0) (0,1,0) 



(c) (d) 

Figure 1. Sample paths of the process II for different examples. Solid lines repre- 
sent actual sample paths. Dashed lines in panels (c) and (d) are the deterministic 
parts in (2.11). In panels (a) and (b), there are two hidden states, and in panels 
(c) and (d), there are three. In each example, jumps of the process X are always of 
unit size. The parameters of each example: 

*■=(» o). e-Ci 1 Q -{~° o '-(ss ;) 

with X a = [1,2], X b = [1,4], A c = [1,2,3], \ d = [1,3,5]. 



characterization 



(2.10) 



n(t) 



.r ( t - a m ,U( 

Ai/i(y m )nx( 

0~m ) 



o- m <t < cr m+ i, m G N 

An fn (Ym ) Iln (o~m ~ ) 



^jeE ^jfj(Ym)Hj(o- m — ) ' J2j£E ^jfj{Ym)Hj(o-m — ) 

where x{t, tt) = (xi(t, tt), . . . , x n (t, tt)) is defined as 
(2.11) ^^^■H'V-l^l'^^ 

and satisfy the semigroup property x(t + u, tt) = x(u, x{t, tt)), for t, u > 0. 



for i £ E, 



The i'th component Xi(-, ■) indicates how likely it is to have a period of [0, t] without any arrival 
on the event {Mt = i}, as expected. Moreover, for < u\ < U2 < • • • < Uk and for a bounded 



FINITE HORIZON DECISION TIMING WITH PARTIALLY OBSERVABLE POISSON PROCESSES 9 

function g(-), we have 

(2.12) E* [g(X t+ul -X tt ---, X t+Uk - X t )\T t x 

= VWt = • E* [g{X t+Ul -X t ,---, X t+Uk - X t )\T t x ,M t = j 

j€E 

riit 



g(x ui , ■ ■ ■ ,x Uk 



= J2^j(t)-^[s(Xu 1 ,--- ,X Uk )\M =j\ =E i 
jeE 

where the first equality in the last line follows from the construction of the process X in (2.1). 
The equation (2.12) together with the characterization in (2.10) implies that II is a (P 71 ", F)-Markov 
process for every if £ D. 

Corollary 2.1. Using infinitesimal last step analysis, it can be shown (see, for example, [9, page 
416], and [25, Chapter 6.7]) that the vector 



(2.13) m(i,7?) = (m l (t,if),...,m n (t,if)) = E n 



l {Mt =i } ■ e" /(u) 



,...,E W 



l {Mt =n } ■ e- 1 ^ 



has the form m(t,if) = if ■ e*^ — ^ where A is the n x n diagonal matrix with An = Aj, and the 
components of m(t, if) solve drrii(t,if)/dt = —Ximi(t,if) + YljeE m i(*) ^f) " Qj,i- Then together with 
the chain rule and (2.11) we obtain 



(2.14) 



dxi(t, if) 
dt 



y^ j qj ji Xj(t,T?) - XiXi(t,TT) + Xjjt, 7r) ^ XjXj(t, if) 



Hence, the process II in (2.10) has the dynamics 
(2.15) 



A^(2/)n« 



p(dt,dy), ieE, 



Y: ^ng - A,n« + n« £ A.ng j + ^ 

where p(-, •) is the point process generated by X; that is 

p ((0, i]xB) = ^ l( 0)t ] xB (ffj, Yi), for every Borel set B G £(M d ) and i > 0. 

3. Constructing the Value Function 

The characterization of the sample paths in (2.15) and general theory of optimal stopping (see, 
for example, [4, 26]) imply that the free-boundary problem associated with the optimal stopping 
problem in (2.3) has the form 

(3.1) max{(-p + £)V(s,if) + C{if); H(if) - V(s, if)} = 0, 

in terms of the infinitesimal generator 



CV(s,if) 



dV(s,if) 



ds 



dV(s,if) 



J 



V s, 



ieE \jeE 

Aivri ft(y) 



jeE 

A„7r n / n (y) 



dm 



EjeE x j *j fj(y) ' " ' ' E jeE A i fjiv) 



V(s,if) 



^-KiXiViidy), 



ieE 



of the process IT. The infinitesimal generator £ is a partial differential-difference operator on 
[0, T] x D C Hence, solving the equation (— p + C)V(s, if) + C(if) = and determining the 



10 MICHAEL LUDKOVSKI AND SEMIH O. SEZER 

boundary of the region {n G D : V(T,tt) = H(n)} is not easy even when n = 2; see, for example, 
[32] who solve free-boundary problems similar to (3.1) for infinite horizon problems, and with n = 2. 

Instead of studying the problem in (3.1), we will employ a sequential approximation technique 
to compute the value function following [20] and [10, Chapter 5]. Similar approach is also taken in 
[3] and [12] for disorder-detection and hypothesis-testing problems respectively in infinite horizon. 
Since our problem is in finite-horizon, we work with time-dependent operators, and this requires 
non-trivial modifications of their arguments. The method is described in the sequel, and the proofs 
are given the Appendix. 

3.1. A sequential approximation. Let us first define the functions 
(3.2) 

V(s,7r) = supE* / <-••' C[\\,)dl +<-'' T JJ{\\ T ) . and 

T<S 

Vm{s, vr) — supE 71 

T<S 



J e~ pt C(U t )dt + e~ pT H (u 7 

rTt\o m , 

J e- pt C(U t )dt + e-^^H [U TArTri 



'■rAcr,, 

for m G N, on [0, T] x D, 



where the first argument 's' should be considered as the remaining time to maturity. 

Proposition 3.1 below shows that Vm's converge to V uniformly; see also the proof of [10, Theorem 
(53.40)] and [12, Proposition 3.1] for related results. Proposition 3.1 is a generalization of these 
results in the finite horizon case. 

Proposition 3.1. The sequence {V m } m >i converges to V uniformly on [0, T] x D. More precisely, 
we have 



(3.3) V m {s^) < V(s,n) < V m (s,n) + (T\\C\\ +2\\H\ 



XT \ 1/2 / A - m/2 



m-lj \2p + X 



for all (s,7r) G [0,T] x D and m G N, where ||C|| = maxj £ | C(jr) \ , \\H\\ = max^ g £) |i^(vf)| and 
A = max ie£ ; Aj. 

Let us consider the second problem in (3.2) for fixed m G N, and let r < s be a F-stopping 
time. Then, the dynamic programming intuition suggests that V(-) should solve the equation 
V m (s,jr) = JoV m -i(s, 7?), where the operator Jo is defined as 

(3.4) 

Jqw(s,7t) = supE 71 



T<S 



ai e~ pt C(H t )dt + l {r<ai} e- pT H (n T ) + l {ai < T} e- p ^w (s - a 1: Ufa) 



for a bounded function w : [0, T] x Di->1R. 

The following characterization of F-stopping times is from [6, Theorem T33, p. 308] and [10, 
Lemma A2.3, p. 261]. 

Lemma 3.1. For every F-stopping time (bounded as r < s < T), and for every m G N, there exists 
a T^-measurable random variable R m such that r A cr m +i = (er m + R m ) A a m +i, F '-almost surely 
on {t > a m }. 

Lemma 3.1 implies that the supremum in (3.4) can equivalently be taken over deterministic 
times, in which case the same problem becomes 

(3.5) V m (s,n) = J V m -i(s,jf) = sup JV m -i(t,s,tf), 

te[o,s] 
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where the operator J has the form 
(3.6) 



Jw(t, S, 7?) = E 71 " 



rtA(Ti 

J e~ pt C(U t )dt + l {t<ai} e- pt H (ll t ) + 1 



Wi<t}e ~ p(7l w ( s - ai, n(cri) 



Note that, with the notation in (2.13), we have 



Fi > n 



-/(«) 



and P 71 [o-i G d«, M u 



E 7 " 



Aa { M u=J }e- /(u) 



= Aj rrii(u, 7?) dw, 



and using the characterization of the paths in (2.10) and (2.14) the operator J in (3.6) can be 
rewritten as 



(3.7) Jw(t 7 s, 7?) = E 7 ^ 



in terms of the operators 



-/(*) 



•e- pt •#(x(i ! 7f)) 

+ / e^" V mi(u, vr) • (c(x(u, vf)) + A* ■ Siw(i 



u, x(u, tt)) )du, 



(3.8) Siw(t,Tf) = 



w t, 



Ai/i(y)vri 



The following lemmas provide basic properties of the operator Jq. 



fi{y)v(dy), for i G E. 



Lemma 3.2. Ifw(-,-) is bounded, then so is Jow(-,-) on [0,T] x D. Ifw±(-,-) < 102 (•,•), then 
JqWx(-,-) < JqW2(-,-)- Moreover, if the mapping tt h-> w(s,tt) is convex for each s G [0, T], so is 
tt i y Jqw(s, tt) /or eac/i s G [0, T]. 

Remark 3.1. For a bounded continuous function w(-, •) on [0, T] x D, the mapping t — > Jw(t, s, 7?) 
is continuous on [0, s] and sup t£ r UiS i Jw(t, s, tt) is attained for all u G [0, s]. 

Lemma 3.3. The operator Jq preserves the continuity. That is, if w(-, ■) is a continuous function 
defined on [0,T] x D, then J$w{-, ■) is also continuous. 

Let us now define the sequence 
(3.9) vq(s,tt) = H(tt), and v m+ \{s,Tr) = Jov m (s,Tr), for m > 0, on [0, T] x £). 

Lemma 3.4. The sequence {%(',-)} m eN *s non- decreasing, hence the pointwise limit v(-,-) = 
sup mgN %(•, •) is well defined on [0, T] x D. Each v m (-, ■) is bounded and continuous on [0, T] x D, 
and the mapping tt i— )• f m (s, tt) is convex for each s G [0, T]. 

Proof. Note that «i(s,7r) = JoUo(s,7r) = JqH(s,tt) = sup t6 r 0jS ] JoH(t,s,n) > JqH(0, s,tt) = H{tt). 
Let us assume that v m > for some m 6 N. Then we get f m +i(s,7r) = Jou m (s,7r) > 

Jo^ m -i(s,7f) = f m (s,7r) where the inequality follows due to Lemma 3.2. Hence, the sequence 
is non-decreasing by induction. 

The claim on continuity, boundedness and convexity clearly hold for vq(-, •) = H(-). Then using 
Lemmas 3.2 and 3.3 it can be verified inductively that these properties also hold for each v m . □ 

Proposition 3.2. The sequences defined in (3.2) and (3.9) coincide. That is, we have v m (-,-) = 
V m (-, ■) for every m G N. 

Corollary 3.1. Propositions 3.1 and 3.2 imply v{-,-) = lim me N %(■, •) = lim m£ N V m (-, •) = V(-,-). 
By Lemma 3.4, each V m (-, ■) is continuous on [0,T] x D. Then, the uniform convergence in Propo- 
sition 3.1 implies that V(-, •) is also continuous. Finally, as the upper envelope of convex mappings 
7? >-> v m (s, 7?) = V m (s, 7?), the mapping tt h-> V(s, tt) is again convex for each s G [0, T] . 
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Proposition 3.3 below characterizes the value function V(-, •) as the fixed point of the operator 
Jo defined in (3.5-3.7), which can also be thought of as the dynamic programming equation for the 
value function V(-, ■). 

Proposition 3.3. The value function satisfies V(s,if) = JqV(s,tt), and it is the smallest bounded 
solution of this equation greater than H(-). 

Proof. Using Lemma 3.4 and Corollary 3.1 we get V(s,if) = v(s,tt) = sup n>1 v n (s,if) = 

sup sup Jv n -i(t, s, if) = sup sup J-u n _i(i, s, if) = sup supE 71 " e~ I( ^ ■ e~ pt • H (x(t, if)) 

n>lt£[0,s] te[0,s] n>l te[0,s] n>l >■ ^ 

+ / e~ p " mj(u, if) ■ (c(x(u, if)) + Xj ■ SjV n -i(s — u, x(u, tt))) du 

= supE 77 \e- I(t) ] ■ e'P 1 ■ H (x(t, tt)) 
t<s L 

+ / e~ pu mj(u, tt) ■ (c(x(u, tt)) + Aj • Sjv(s — u, x(u, tt))) du 

i<=E 

= sup Jv(t,s,Tr)= sup JV(t,s,Tr), 
te[o,s] te[o,s] 

where the fifth equality is from (3.7) and the sixth equality is by the bounded convergence theorem 
since we have [|u m (-, -)|| < ||«(-, Oil < 11^(011 + r ll^(0ll for all m G N. 

Let W(-, be another solution of W(s,tt) = JqW(s,tt), such that W(s,tt) > H(tt) = vo(s,tt). 
Applying Remark 3.2 we obtain W(s,tt) = JoW(s,tt) > sup t6 r 0)S i Jvo(t,s,Tf) = vi(s,tt). By induc- 
tion, W(s,tt) > v n (s,Tr) for all n and hence W(s,n) > lim n _ >00 v n (s, if) = V(s,ir). □ 

We finally close this section with the following result which will be useful in Section 4 in estab- 
lishing an optimal stopping time. 

Lemma 3.5. For deterministic times u < t < s, and for a bounded function w(-, ■) we have 

(3.10) Jw(t, s, if) = Jw(u, s, if) + IT {at > u} ■ e~ pu ■ (jw{t -u,s-u, x(u, if)) - H(x(u, if))) 

Corollary 3.2. Let w be a bounded function as in Lemma 3.5. Taking the supremum in (3.10) for 
fixed u and s we obtain 

sup Jw(t, s, if) = Jw(u, s, if) + F 77 {a\ > u} ■ e~ pu ■ ( Jow(s — u, x(u, if)) — H(x(u, if)) J , 
te[u,s] ^ ' 

where Jo is as defined in (3.5). 

4. An Optimal Strategy 

Recall that the process H has right-continuous paths (with left limits), and the functions V(-, 
and H(-) are continuous due to Corollary 3.1. Hence the paths of the process V(t, LTj) — H(Ht) are 
also right-continuous and have left limits. Therefore, for e > the random time 

(4.1) U e (s, if) = inf |t G [0, s] : V(s - t, U t ) - e < #(ILj} 

is a well-defined F-stopping time. Observe that we have U e {s, if) A o\ = r £ (s, if) A o\, where 

(4.2) r E (s, if) = inf {t G [0, s] : V(s - t, x(t, if)) - e < H(x(t, if))} , 
which can be considered as the deterministic counterpart of (4.1). 
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Remark 4.1. For r e (s,7r) defined in (4.2) we have 

(4.3) sup JV(t,s,n)= sup JV(t, s,tt). 

te[a,s] te[r e (s,n),s] 

Proof. For t < r e (s,7r), Proposition 3.3 and Corollary 3.2 give 

JV(t,s,n)= sup JV(u,s,tt) > t}e~^(v(s - t,x(t,Tr)) - H(x{t,Tr))). 

ue[t,a] v ' 

Since t < r £ (s,7r) we have V(s — t,x(t,7r)) — H(x(t,it)) > e. Hence 

JV{t, s, tt) < sup JV(u, s, n) - eP^jfj! > t}e~ pt 

ue[t,s] 

< sup JV(u,s,ir)-eF 7f {a 1 >t}e- pt < sup JV(u,s,n). 

ue[o,s] ue[o,s] 

Therefore the supremum in sup tg [ 0]S ] JV(t,s,ir) must be achieved on [r e (s,7r), s] and (4.3) follows. 

□ 

Proposition 4.1. T/ie stopping time U £ (s, tt) defined in (4.1) is an estopping time for the problem 
in (2.3), i.e., 



(4.4) 



U s (s,n) 



e- pt C(U t ) dt + e -pUe(sJ) H (u(u e (s, tt) 



>y( S ,7f)-£, 



for alle>0 and (s,vf) G [0,T] x D. 



Before proceeding with the proof of Proposition 4.1, we first state an immediate consequence of 
this result. 

Corollary 4.1. The stopping time Uq(T,t?) is an optimal rule for the stopping problem of (2.3), 
and the pair (Uo(T,jv),d(Uo(T,jr))) is an optimal admissible strategy for the problem in (1.5). 

Proof of Proposition 4-1- Let us define 

(4.5) Z t ± f e- pu C(ti u )du + e- pt V{s-t,ti t ), te[0,s], 

Jo 

which is a bounded process on t £ [0, s] C [0,T]. We will show that the stopped process 
{•ZtA[7e(s,7F)}te[0,s] is a martingale and satisfies 

(4.6) ¥?[Z Ue{s j ) ] = Z = V(s^). 

The process Z captures the natural idea that one should not stop as long as the value function (i.e. 
the continuation value) is larger than the immediate reward. Note that e-optimality of U £ (s,n) 
follows easily from (4.6) since this equality would imply V(s,tt) = IE 71 " [^[/ e (s,7f)] = 

(4.7) 



e- pt C(n t ) dt + e- pu ^V(s - U E (s,7r),ii Ue{s!jf) ) 



e- pt C(n t ) dt+ 



< E 7 " 



e- pt C{U t ) dt + e- pu ^H{YL U£(s ^) 



due to regularity of the paths t h-> V(t,Ht) — H(Ut). In the remainder of the proof we will show 
(4.6) by establishing 

(4.8) ¥?[Z Us{s ^ a J = Z , for m = 1, 2, . . . , 
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inductively. After taking the limit as m — > oo in the equality above, we will then obtain (4.6) due 
to bounded convergence theorem. 

First, consider the equality (4.8) for m = 1. Recall that U £ (s,tt) A o\ = r £ (s,n) A o\. Then 



r e (s,7r)A<7i ^ ^ 

+ l{ ffl >r E ( s ,x)} • e- pr ^ (V(s - r £ (s,7f),n r£ (^)) - H(n re{Sjj ? 

= JV(r £ (s, n),s, t?) + e - pr ^ ■ W*{<n > r £ (s, vf)} • (V(s - r £ (s, n),x(r £ (s, vf), if)) - H(S(r e (s, vf), vf))) 
= sup jy(n, s,7r), 

u£[?" e (s,7?),s] 

where we used Proposition 3.3 and Corollary 3.2 for the last equality. By Remark 4.1, we get 

E?r [ Z U s (s,Tv)Aai] = SU P^ JV(ll, S, vf) = J V(s, 7?) = V(s, 7?) = Z , 
uG[r e (s,if),s] 

and this establishes the result for m = 1. 

Now suppose by induction that (4.8) is true for m > 1 and consider the equality 



(4.9) 



E 7 " 



1 {C/ e ( S ,7f)< ( 7 1 }2 , C/ £ ( S ,7f) + l{{7 e ( S ,7r)> ( 7 1 }^£/ e (s,7f)Ao- m+ i 



{D s (s,7f)< cr l} 



£/ e (s,7F) 







e-^C7(ILj eft + e -^( s - 7f )y( s - U e (s, tt), ii aeM 



(/•C/ e (s,7f)A(T m + l . 
y o e-^c(n t ) di + e -pUe(s,n)A* m+1 y ^ _ ^ e(S)ff) A (7w+1) n [/e{S)ff)ACTm+1 



On the event {C/ e (s, vr) > cii}, we have t7 £ (s, 7?) A a m+ i = a\ + [U £ (s, tt) A a m ] o CT1 , where 9 is the 
time-shift operator on Q; i.e., Xt o 9 S = Xt +S . Using the strong Markov property of II, equation 
(4.9) becomes W[Z Ue ^ A(Tm+x ] = 



(4.10) E n 



{C/ £ (s,-?)<cr 1 } 



U £ (s,if) 



e-*C(fi t ) dt + e- pU ^V(s - U e , U Us{s> ^) 



+ / e- pt C(U t )dt + l {Ue{s ^ ai} e-^ fis-ar,^ 
Jo 



where /(u, vr) 



(4.11) E* 



U e (s,n)Aam 



e- pt C(TL t )dt + e -P u ^^V(u - U £ (s, vf) A a m , fi^.^wj 



V(u,tt), 
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by the induction hypothesis for m. Combining (4.10) and (4.11) we get E 7r [Zf/ e ( S) ^) A(Tm+1 ] 
V.(«,*)<*i} U Q U£iS ' n) e- pt C(n t )dt + e-? u ^V(s - U e , U Ue[s>j{) ) 



E 77 



+ E 71 



l{u e (s,Z)>cn} e-*C{%)db + e~^V(s - *i,U ai )\ 



E" 



i {C/ e (s,7f)<Cr 1 }-2'(7 £ (s,7f) + l{C/ e (s,7?)>CTl}-^0-l 



E 77 [Z, 



£/ e (s,7?)Aoi J 



where the last equality follows from our result for m = 1. Hence we have E 71 " [Z^ e ( S) jf) A(J J = 
and this gives (4.8) for m + 1. □ 

4.1. Stopping and continuation regions. Let 

C T 4 {( S)7 ?) G [0,T] x D : F( s ,tt) > /T(t?)} , 

r T 4 {( s> v?) G [ ,T] x D : V(s,tt) = H(tt)} 

denote the continuation and stopping regions respectively. The stopping region can further be 
decomposed as the union L>keA^T,k of the regions 

(4.13) r T , k ±{(s,n)e[0,T}xD:V(s,n)=H k (Tr)}, k G A, 

where H k is defined in (2.4). Corollary 4.1 states that in the optimal solution (Uq(T, tt), d(Uo(T, tt))) , 

one observes the process LT until Uo(T,n), whence it enters the region Tt- At this time, if IT is in 
the set r^fc we take d(Uo(T,if)) = k; that is, we select the fc'th action in the action set A. 

Remark 4.2. The definition of the value function V in (2.3) implies that the mapping s i— > V(s, tt) 
is non-decreasing. Therefore if (s, tt) G Tx t k for some (s, tt) G [0, T] x D, then we have (t, tt) G Tt,u 
for all t < s. In other words, each region Tt k is growing and the continuation region Ct is shrinking 
as time to maturity decreases. 

Remark 4.3. For fixed s < T, let (s,v?i) and (s,7?2) be two points in the region IY,fc, and let a G 
(0, 1). As the upper envelope of convex mappings tt — > v m (s, tt) (see Lemma 3.4 and Corollary 3.1), 
the mapping tt — > V(s,tt) is convex for each s G [0,T]. Using this property we obtain 

H k (a ■ tti + (1 — a) ■ 7r 2 )) < V(s, a ■ 7?i + (1 — a) ■ 7r 2 ) < a ■ V(s, tti) + (1 — a) ■ V(s, tt 2 ) 

= a ■ H k (7Tt) + (1 - a) • H k (TT 2 ) = H k (a ■ tt\ + (1 - a) ■ tt 2 )), 

which implies that (s, a ■ tt\ + (1 — a) • tt 2 ) G r^fc, and the region IY,fc H ({s} x D) is convex for 
each fixed s <T and k G A. 

Remark 4.4. The stopping region is never empty since the decision maker has to select an action 
eventually, the latest at the terminal time T. That is, Tt 5 {(0,vr);vr G D} ^ 0. The region 
{(s, tt) G : s > 0} may however be empty. In an example where minj g £ a > and fi k /s are all 
the same it is never optimal to stop prior to terminal time T. 

Note that the region {(s, tt) G Tt '■ s > 0} may be non-empty but still may have an empty interior. 
For example, let us consider the hypothesis testing in (1.3). In this minimization problem, all the 
states of the unobservable Markov process are absorbing, and each component ILv = F{Mt = 
ilJ-j*} = IP{Mo = ilJ-j^} of process II is a martingale. Since the terminal reward function of the 
corresponding stopping problem (see (2.4)) H(-) = mm k£ E H k (-) is concave, the process -ff(IIt) is 
a supermartingale on [0, T]. If we select p = and a = for all i G E in (1.3), it is therefore never 
optimal to stop early on the interior of {(s,tt) G Tt ■ s > 0}. In this case, there is no penalty 
associated with a delay in the decision. Hence the DM will choose to observe it as much as possible 
prior to a decision unless she knows for sure which hypothesis is correct. 
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Lemma 4.1. For i G E, let A*(i) = {k G A : = maxj g _4 -(f the inequality Ci — 

PPk,i + Yj^iipkj ~ Hk,i) a i,j > holds for all k G .A*(«), i/ien i/iere exists irf < 1 swc/i i/iai 
{(s,7?) G (0,T] x D : 7Tj > 7T?} C Ct- Moreover, nf can be selected independent ofT. 

If the hidden process M is known to be in state i G E, then the expression —pp>ki is the instan- 
taneous decay of the payoff from selecting action k G A immediately, and q is the instantaneous 
cost of waiting. Moreover, under action k G A, the term Y^j^ii^kj ~ Pk,i)lk,j is the marginal rate 
of return from waiting for the hidden process M to jump to another state. Therefore the sum 
of these three terms appearing in Lemma 4.1 is the instantaneous net return enjoyed by the DM 
under action k G A. Lemma 4.1 indicates that if there is strong posteriori evidence that M is in 
state i, and if the instantaneous net return is positive under all favorable actions (whose terminal 
reward dominates others around the i'th corner of D), the decision maker should not stop at 
that point (unless T = 0). 

4.2. Stopping regions for reward maximization with running cost. Here, we consider the 
problem in (2.3) with the assumption a < (running costs) for i G E, and Jl = maxfej^j > 
(terminal rewards). The second condition is not restrictive if p = since we can always add (and 
subtract) the same constant to (and from) the terminal reward function. 

Let us define 

(4.14) I* ={i£ E : max/i^j = p}, 

k&A 

which is the set of the states of M, at which the DM can get the highest terminal reward. Since 
Ci < for all i G E, we have Uj g /*{(s, tt) : s G [0, T] , 7Tj = 1} C Dp. That is the DM stops whenever 
the process II reaches a point of global maximum of the terminal reward function H(-). 

In general, if there is a penalty associated with waiting, we expect that it is optimal to stop 
on the points (s,tt) for which the "best" component 7Tj, i G I*, is sufficiently high, for any s > 0. 
Lemma 4.2 provides a sufficient condition for this to be true. It implies that if the discount rate 
is strictly positive, or if the cost of waiting for the highest reward is strictly positive, then we stop 
whenever m, for i G I*, is relatively high regardless of the remaining time to maturity. 

Lemma 4.2. Let i G I* . If p > 0, or a < 0, then there exists a number tt| < 1 such that 

r T 3fef>[0,T]xfl : vr 4 ><}, 

and the value of n? can be selected free of the time to maturity T. 

Remark 4.5. If H(-) > 0, the statement of the stopping problem in (2.3) implies that the value 
function V is non-increasing as a function of the discount factor p. If we denote the dependence of 
the stopping region on p with IY(p), then we have Q ^t(P2) whenever p\ < p%. Moreover, 

the dynamics of the process II are independent of p and Uo(s,n) is the hitting time of II to Tt- 
Therefore, the time that the DM can afford for observing the process X in the presence of a lower 
discount factor is no less than that spent under heavier discounting. 

A similar claim also holds for dependence of Uq(s,tt) and Tt on the running costs Cj. Namely, 
an observer with lower (in absolute value) running costs stops no sooner than another one with 
heavier running costs. 

4.3. A nearly-optimal strategy. On a practical level, one cannot compute V directly, but instead 
computes the approximate value functions V m 's defined in (3.2) and employs the corresponding 
nearly-optimal strategies (see 4.15). It is therefore important to know the error associated with 
this approximation. 
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m = inf { k G N : (T\\C\\ + 2 \\H\ 



XT 
k — 1 



1/2 



A 



fc/2 



2p + Ay 



< e/2 



such that \\V m — V\\ < e/2 on [0,T] x D via (3.3). Next, let us define the stopping times 



(4.15) 



U${s,if) = mf{t G [0, 8 ] : V m {s,U t ) - e/2 < H(U t )}. 



The regularity of the paths t h- > lit implies that V ( U^J^(s,tt),II 



77 ( n tf$M) 



< e. 



Then the arguments in the proof of Proposition 4.1 (see (4.5), (4.6), and (4.7)) can easily be 
modified to show that 



(4.16) 



V(s,tt) = E 7r 



< E 71 " 



f C/i m) ( S ,7f) ( ) / 



Ui '(s,7r) 



Hence, if we apply the admissible strategy ^C/i m ^(T, 7r), d{Ue m \T, 7r))J , which requires computing 
(3.2) only up to m defined above, the resulting error is no more than e. 

4.4. Infinite horizon problem as an approximation. In general, if there is a strict penalty 
for waiting, it is likely that the DM will make a decision prior to the final time T for moderate 
or large values of T. In this case, the constraint r < T in (2.3) is of less importance, and one 
essentially faces an infinite horizon stopping problem. Solving the infinite horizon problem can be 
computationally more appealing since we eliminate the time-dimension of the state space [0, T] x D. 
Below, we show that the value function of the finite-horizon problem converges uniformly to that 
of the infinite horizon under the assumption 

(4.17) "either p > 0" or "max Ci < 0". 

i&E 

The infinite horizon problem is defined as in (2.3) (and (1.5)) by removing the constraint r < T. 
With the notation in (2.3), let V(oo, n) be the value function of this stopping problem. 

Lemma 4.3. As T /* oo, the function V(T, tt) converges to V(oo, vr) uniformly on D, and we have 

(4.18) V(T, tt) < V(oo, tt) < V(T, n) + Err(T), for all tt G D and T > 0, 
where 

'e-' T (||C||+2-||tf||) , ifp>0 

2 • \\H\\ ( min fc ,j pk,i - max fcii n k)i ) 



Err(T) 



Hp 



and max a < 0. 



max ieE a 

The explicit error bounds for the rate of convergence allows to approximate V(T, ■) with the value 
function of the infinite horizon problem when T is large. The function V(oo,tt) can be computed 
sequentially as in Section 3. That is, if we define the non-decreasing sequence 

rTA(T m 



e- pt C(U t ) dt + e 



■prAo-m 



H(U T 



Acr„ 



m G N, 



(4.19) y m (oo,vf) = supE^ 

r>0 

then it can be shown that the elements of this sequence can be computed by applying a functional 
operator Jq, which is obtained from the operator Jo in (3.7) after replacing the constraint t G [0, s] 
with t > 0. Also, note that the new operator Jq is defined on the domain of functions defined on 
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D only. The proof of these statements can be obtained by modifying the arguments of Section 3, 
or those in [12, Section 3]. Moreover, following the proof of Proposition 3.1 and the arguments of 
Section 4.3, we have 



\V m (oo,-) - V(oo,7?)[| < Errooim) 



A 



P + K 



max ieS a 



m 



1/2 



if p > 0, 



if p = and max Cj < 0, 

i&E 



and the stopping time 
(4.20) 



C/i m )(oo,vr) ^inf{t>0: V n (oo,n t ) - e < H(U t )} 

is e-optimal for the infinite horizon problem (see also [12, Section 4.1]). 

Note that for large m, the function V m (oo, •) approximates the function V(oo, •), and for large 
T, V(oo, •) is a good approximation for V(T,-). However, the stopping rule in (4.20) is not a 
good substitute for the optimal time Uo (T, 7?) since the former may not be less than T almost 
surely. Moreover, since Ue m \oo, if) may be greater than Uq(T, if), Proposition 4.1 is not necessarily 
true. In particular, the martingale property (4.6) may fail. Nevertheless, if we apply the rule 
Ue (oo, if) A T, we can still control the error for large T. Indeed, in Appendix A2, we show that 

(4.21) V(T,if)<W? 



C/ £ (m) (oo,t?)AT 



e-«c(n«) it + {t^, 



+ e + Err ^(m) + Err^Q) ■ Err(T). 



Hence, if T is large enough (so that Err^ti) ■ Err(T) is small), by taking e in (4.20) small for a 
large value of m, the error associated with applying A T can be reduced to acceptable levels. 



5. Discrete information costs 

As the case studies of Section 1 demonstrate, the objective function in (1.5) is applicable to a 
variety of economic settings. This has allowed us to provide a unified treatment of many disparate 
models. Returning to the economic interpretation of the running costs appearing in the first term 
in (1.5), in a typical setting they represent information acquisition expenses, such as observation 
expenses, subscription costs to market data and holding outlays. In such a case, it is natural to 
model the total cost incurred by decision time r as the sum J e~ pt cdt where c is interpreted as 
nominal running cost and p is the interest rate. 

Alternatively, the costs can correspond to opportunity costs, e.g. if M is the profitability of a 
new product then the opportunity costs of not launching the product should depend on {iWt} t6 [o )T ]. 
This motivates the consideration of J Q T e~ pt Cil{M t =i} dt where q G M. and p can again be interpreted 
as the discount factor. 

Finally, observation costs may be discrete and be incurred only when new information arrives. 
This, for example, happens if new information corresponds to opportunities lost (e.g. deals signed 
by competitors), leading to a cost structure of the form Y2j=i e~ paj K(Yj). Here, N T is the number 
of arrivals by time r, (aj,Yj) are the arrival times and marks respectively, and K(Yj) is the cost 
incurred upon an arrival of size Yj (with K : M. d 1— y M. satisfying viK + = J^ d K + (y)vi(dy) < 00, 
V? e E). 
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In the third case, one deals with the objective function 
(5.1) U(T,n) = sup E* ' 



e-^K(Yj) + e-" £ l {d=k} M W ■ V.= i} ) 

3=1 k=l i£E 



by solving the equivalent stopping problem 



V(T, jf) = sup E n 

T<T 



e- pa ^K{Y j ) + e- pT H (fl T 

J=i 

as in Proposition 2.3. One can verify that the sequential approximation method of Section 3 holds 
for the function V . Namely, if we define the sequence 



F m ( S ,vr) ^supE" 

T<S 



mAN T 

£ e-^K(Yj) + e-rr^H (u rA(7n 

3=1 



m G N, 



it can be shown (see (3.5-3.8), Proposition 3.2) that we have V m+ i(s, n) = joV m (s,T?) where the 
operator Jo is defined as 



Jqw(s, 7f) = sup E^ 
te[0,s] 



-j(t) 



e"^ -H(x(t,if)) 
+ r e - pn Vm i (t,7f)-A i f / 



K(y)vi{dy) + SiU>(s - «, ff)) d-u, 



for a bounded function : [0, T] x D i— > R. 

Clearly {T^j} m >o is an increasing sequence. Using the inequality E Ylf=i K+ ( Y j ) < 
(maxj g £ ^i-R'" 1 ") and the truncation arguments in the proof of Proposition 3.1, one can show that 
the sequence converges to V uniformly with the error bound 

1/2 / T x m/2 



< V - V m < (maxAi)T • (max^K + ) + 2\\H\ 



XT 



m — 1 

Arguments in Sections 3 and 4 can then be replicated to conclude that 



A 



2p + A 



W e ~ P<TjK ( Y i) + e- p& ^H (n(£> £ (s,7r)J >V(s,Tf)-e, 

i=i 

for the stopping time U £ (s,tt) = inf |i G [0, s] : V"(s — t, IIj) — e < (lit) j. Hence, the admissible 

strategy (U £ (s,Tf),d(U e (s,Tr))) is an optimal strategy for the problem in (5.1), as expected. 

Furthermore, other results of Section 4 can be adjusted for this new objective function. Below, 
we summarize these results in a remark. 

Remark 5.1. Let UjK = J Kd K{y)vj{dy), for j G E. 

(i) For a given index i G E, Define A*(i) = {k G A : pk,i = maxj g _4 /i^} as in Lemma 1^.1. If 
—p^k,i + K • ViK + Yljj£i(t J 'k,j — fJ-k,i)Qij > holds for all k G A*(i), then there exists some 
itf < 1 (for allT > 0) such that it is optimal to continue on the region {(0, T] xD; iTi > 7rf}. 

(ii) Assume VjK < /or a// j G £7, and Jl = max& j/ifcj > 0, and Zei I* 6e as in (4.14). For 
i G I* , if U{K < or p > f/iere exists a number jtf < 1 f/ree o/Tj suc/t that it is optimal 
to stop at the points tt for which m > vr| . That is: Tt% 12 {[0, T 1 ] x D; 7Tj > 7r|} /or a// 
T>0. 
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(iii) In the case where VjK < for all j G E, and H(-) > 0, the stopping region is monotone 
in p and VjK, for j G E. Namely, if we increase one of these factors in absolute terms 
(keeping everything else fixed), the stopping region expands, and the DM is forced to make 
a decision sooner. 

(iv) For a given e > 0, let m G N such that \\V(T, •) — V(T, - )|| < e/2. Then the stopping time 
U^(s, tt) = inf ji G [0, T] : V m (T — t, tit) — e < -ff(fft) j gives an e-optimal strategy. 

(v) // "p > 0" or "K(-) < with maxj S £ ViK{-) < 0", then V(T, •) /*■ V"(oo, •) uniformly as in 
(4.18) if we redefine 



e" pT (maxA i -maxi/ iJ Fs: + + 2- \\H(-)\\) , if p > 



Err(T) I 2 .||iJ(.)|| ( min fe)i fi k>i - max fe)i p hji ) 



if p = 0, K(-) < and max ViK < 0. 



T minj g £ Aj • maxjgg v^K ' " i&E 

6. Examples 

Below we provide numerical examples illustrating the use of our sequential approximation ap- 
proach developed in Section 3. In each example, we approximate the value function by repeatedly 
(finitely many times) applying the operator J in (3.5) starting with the initial function H{ ). We 
set the number of iterations m £ N such that the error ||V^(-) — ^(-)|| is negligible (see (3.3)). 

6.1. Insurance launch. Our first example illustrates profit maximization with information cost, 
which is the first example in Section 1.1. Here, Mj represents the state of the economy with three 
major states E = {1, 2, 3} = {Boom, Growth, Recession}, and with the generator 

/-4 3 1 
Q = 2 -4 2 
\ 3 -3, 

Let A = [Ai,A2,A3] = [1,2,5] and v = [^1,^2,^3] = [Gamma(3,2),Gamma(4:,2),Gamma(5,2)]. 
Conditional on the state of M being i G E, the frequency of claims is A« and their common 
distribution is Here, we consider the objective function in (1.1) with p = [ps, pg,Pr] = 
[6, 1, —3], p = 0.1 and c = —0.3. As before, d = 1 represents the decision to launch the new policy; 
d = represents the decision to abandon, and does not involve any cashflows. The horizon is taken 
to be T = 0.8 (whose unit is to be consistent with that of Aj's; e.g., if Aj is in "customers per 
month" , T is in months) . 

For this example, we discretized D = {tt G : ttb + ttg + ttr = 1} using 100 grid points 
in each dimension and computed V m such that \\V m — V m ~i\\ < 10 -4 . The triangular regions 
in Figure 2 show the region D. The corners {B, G, R} corresponds to points where the states 
{Boom, Growth, Recession} have posterior probabilities equal to 1 respectively. The left panel of 
Figure 2 shows the value function V(0.8, tt) and the shaded region is {tt G D : y(0.8,7f) = H(tt)}. 
Recall that it is optimal to stop as soon as V(T — t,Ht) = H(U t ) and the corresponding stopping 
region is time-dependent. The right panel of Figure 2 illustrates this point by varying the problem 
horizon T. As expected from Remark 4.2, when T decreases, stopping regions expand. In particular, 
we see that with very little time left (T = 0.1 and T = 0.2), it is optimal to stop whenever ttb (where 
action d = 1 is chosen) or ttr is high (where quitting d = is optimal). For longer horizons, the DM 
can afford to wait for favorable circumstances and release the product then. That is, stopping and 
selecting d = is never optimal when time-to-maturity is not small. Also note that the terminal 
reward associated with d = 1 is higher than that of d = around the corner G. Moreover, with 
the notation in Lemma 4.1 we have re = cq — p Pg + (ps — I^g) Qg,b + (fJ-R — Pg) Qg,r = 1-6 > 0. 
Then by Lemma 4.1, it is never optimal to stop around the corner G (unless T = 0) as shown the 
in right panel of Figure 2. 
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\\V-T=0.2 / 

Y^T=a,4 
\-T=0.8 



FIGURE 2 . Value function and stopping regions of the insurance launch example of Section 
6.1. The left panel displays the value function V(T, if), for if G D and T = 0.8. At T = 0.8, 
if the conditional likelihood process II is in the shaded region, the DM stops and selects action 
d = 1. Otherwise, she continues observing until the first time V(T — t, Tl t ) = H(H t ). The 
right panel shows the dependence of the stopping regions on horizon T . 

6.2. Bayesian regime detection. Recall the hypothesis testing problem in (1.3). Let V{cc,if) 
denote the value function of this minimization problem on infinite-horizon. With the notation in 
(4.12), it is shown in [12] that it is optimal to stop the first time the conditional probability process 
n enters the region yJk^E^oo,k where = {-k G D : V(oo, vf) = H^tt)} in terms of the functions 
Hk{^) = SieE A f fc,i 7r «- Each is a convex region with non-empty interior around fc'th corner 

of the simplex D. Namely, an observer stops whenever the conditional likelihood of one of the 
hypotheses is sufficiently high. This structure also extends to the finite-horizon problem. Since 
V(oo, ff) < V(T,tt), we have C for k G E and T < oo. In plain words, regardless of 

the remaining time to maturity, the observer selects immediately one of the hypotheses when the 
conditional likelihoods process II is around the corners of D (i.e., if there is sufficient posterior 
statistical evidence). 

In Figure 3, we illustrate the time-dependence of the solution structure using a simple example 
with two hypotheses H\ : A = Ai and H2 : A = A2 on the arrival rate only. The problem in infinite 
horizon where there are two hypotheses on the arrival rate was solved for the first time by [32] 
(with A2 > Ai without loss of generality). The authors showed that the immediate stopping is 
optimal if and only if //2, 1^1,2(^2 — Ai) < //2,i + ^1,2 (see [32, Theorem 2.1]). Hence the inequality 
A t 2,iA t i,2(A2 — Ai) > fi2,i + /^l, 2 has to be satisfied in any finite-horizon problem with non-trivial 
solution. 

In Figure 3, under H\ the arrival rate is Ai = 1 while under H2 it is A2 = 5. For the Bayes risk 
given in (1.3), we select fxi^ = /"2,i = 2 for the penalty costs for selecting the wrong hypothesis. 
This numerical example corresponds to the one considered in [32, Figures 2-3]. The left panel of 
Figure 3 shows the value functions V(T, •) with horizons T = 0.1, T = 0.2, T = 0.4 and T = 2 
respectively, and the terminal reward H(tt) = minj^i^^ ; /^2,i(l — ^2)} on the state space of 
7T2 G [0, 1]. We see that as more time is available to make the decision, the value function decreases, 
as expected. The right panel of Figure 3 shows that the continuation region widens as time to 
maturity increases. We also observe that the boundary curves approaches the solution structure of 
problem with infinite horizon. [32] obtain a continuation region of [0.22,0.70], very close to ours of 
[0.230,0.705] for T > 1. 

Let us define the lower boundary curve T h-> 61 (T) = sup{7T2 G [0, 1] : V(T, 7r) = 27^}. Clearly 
6i(0) = 0.5. In the right panel, we observe that the lower boundary curve &i(-) has a discontinuity 
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FIGURE 3. Bayesian regime detection example of Section 6.2. The left panel shows the 
value functions V(T,tt) for various time horizons T. The right panel shows the stopping 
regions IV,fc (namely Tt,o below the lower curve and Tt,i above the higher curve) for T = 2. 



at T = (jumping from TT2 = 0.5 to approximately tt2 = 0.25) and then remaining constant until 
about T = 0.2. Note that the point tt = (tti,^) = (0.5, 0.5) is the global maximum of the terminal 
cost function H(n). Starting at the point (0.5 + e, 0.5 — e), for e > and small, as long as there is 
no jump, the conditional likelihood process II drifts (quickly) toward the point n = (tti,^) = (1, 0) 
and away from this maximum. For very small values of T, the probability of observing a jump 
is low and thus it is optimal to continue. Therefore, the lower curve in Figure 3 is discontinuous 
around T = 0. The rate of drift of the process II to the point (1, 0) decreases as 1x2 decreases and 
approaches the point (1,0) (see (2.14)). As a result, at points tt where 1x2 is small, the effect of 
waiting cost becomes dominant and it is optimal to stop even if T is small. 

The following remark summarizes our discussion on this problem and states that the behavior 
of the lower boundary curve around T = holds for any set of parameters A2 > A]., ^1,2, ^2,1 • Its 
proof can be found in the Appendix. 

Remark 6.1. Consider the hypothesis-testing problem in (1.3) with two simple hypotheses on 
the arrival rate: H\ : A = Ai and H2 : A = A2 (with A2 > X\). The continuation region Ct 
is non-empty (for T > 0) if and only if ^2,1/^1,2 (A2 — Ai) > //2,i + Ml, 2- The boundary curve 
T I—?- b\(T) = sup{7T2 £ [0,1] : V(T,tt) = ^1,2^2} is discontinuous at T = 0, and there is an 
interval around T = at which bi(-) is constant. 

6.3. Optimal replacement of a system. Here we consider the reliability problem in (1.4). In 
this problem, the unobservable Markov process M represents the current productivity of a given 
machine, and the n'th state (defective state) of M is absorbing. The objective is to find the best 
time to replace the equipment in order to maximize the net lifetime earnings. The problem is 
studied by [24] under certain assumptions on (qi,j)i,j£E, A, jl and csuch that the infinitesimal look- 
ahead (ILA) rule t ila := inf{t > : ^ rjlL^ < 0} is optimal where r, = + Y^j^iiHj — l^i)Qi,j (cf. 
Lemma 4.1). More precisely these assumptions are (i) qt ^ for i = 1, . . . ,n — 1, with q n = (ii) 
f\ > T2 > ■ ■ ■ > r n = c n , with c n < (iii) < Ai < . . . < A n , (iv) qi n > A n ! — Aj for i = 1, . . . , n— 1. 

It follows as a corollary to [22, Theorem 3.1] that t ila AT is an optimal stopping rule for the finite 
horizon problem under these assumptions. Therefore, the region {7? G D : V(T,tt) = H(tt)} does 
not depend on T. This occurs because the instantaneous revenue rates rj's completely summarize 
the relative worth of different machine states, and the sum X^eE r i^t" > is monotonically non- 
increasing over time P^-almost surely for all 7? € D (see [24, Theorem 2]). Thus, T only plays a 
role insofar as allowing the DM to collect profits before the machine deteriorates. 

We illustrate this degeneracy in Figure 4. In this example, we select the parameters to fit 
the framework of [24]. We have a machine that moves through three regimes E = {1,2,3} = 
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FIGURE 4. Value function V(T, tt) of the reliability example of Section 6.3. The shaded 
regions represent the computed stopping regions {if G D : V(T,tt) = H(tt)}. Left panel 
shows T = 1.5, right panel shows T = 0.2. The shaded regions are the same in both panels. 
Note however the different z-scales. The panels also show the line 3.57Ti + 1.5-7T2 — TT3 = 0, 
which is the stopping boundary of the ILA rule in (6.1). 




FIGURE 5. The second example for the reliability problem of Section 6.3 with the new 
parameters in (6.2). In the left panel T = 2, in the middle T = 0.5, and in the right panel 
T = 0.1. In each picture, the function V(T, if) is plotted on D. The shaded regions are the 
sets {tt g D : V(T, tt) = H(tt)}. 



{Good, Average, Poor} with transition matrix 

/-4 1.5 2.5\ 
Q= -1.5 1.5 . 
\ / 

At different states, the running profit from operating the machine is c = [1,0,-1], and shut- 
ting down the machine for maintenance involves a cost of [J = [—1,-1,0]. Thus, it is costly 
to shutdown a machine until it is in the Poor state. In each state, the breakdowns occur ac- 
cording to independent Poisson processes with intensities A = [2,3,4]. In this setting we have 
r = {ri,T2, r^} = {3.5, 1.5, —1} so that 

(6.1) t ILA = inf{i > 0: 3.5n^ } + 1.5Ilf } - nj 3) < 0}. 

The left and right panels of Figure 4 show the functions V(T, tt) and the regions {tt £ D : (T, tt) 6 
r-r} for T = 1.5 and T = 0.2 respectively. We see that V(0.2,7r) < F(1.5,tt) but the regions 
{tt G D : V(T, tt) = H(tt)} for T = 0.2 and T = 1.5 completely matches the region {tt G D : 3.57T1 + 
1.57T2 — 7T3 < 0}, at least modulo the D-discretization necessary for numerical implementation. 
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This degenerate structure would disappear if one removes some of the assumptions in [24], for 
example the special form of generator Q and/or the arrival rates A above. We give an example in 
Figure 5 where 

/-l 0.5 0.5\ 

(6.2) Q= \ -0.5 0.5 and A = [Ai,A 2 ,A 3 ] = [1,4,7]. 

\ / 

We keep other parameters the same as in the previous example. In this example, the instantaneous 
net gain X^ie-E r i^f > = 1-5IIJ + 0.5Il| 2 ^ — LT^ 3 '' is not monotonically non-increasing P^-almost 
surely for all tt 6 D anymore. For example, using (2.14) it can be shown that d(1.5xi(t, tt) + 
Q.5x2(t,T?) — X3(t,n))/dt\t=Q > at the point tt = (tti, tt2, ^3) = (0.45,0.45,0.1). Figure 5 shows 
that the structure of the stopping region is indeed time dependent. The stopping region expands 
as time to maturity decreases. Moreover, in this problem the transition rates of M are lower. 
Therefore, the DM can obtain positive net gain when M starts from the state {1} and there is 
enough time to operate the system. Indeed, the first panel in Figure 5 shows that for T = 2 the 
value function is positive around the corner {1}. 

6.4. Technology adoption example. To illustrate an example for the discrete cost structure of 
Section 5, we consider an IT company, which is planning to add a new technological feature to 
its products. The benefit of the technology is unknown, but will improve over time as customer 
awareness grows and production is streamlined. The company wishes to adopt the technology at 
the optimal time that best resolves the tension between early adoption (with high production costs) 
and late adoption (with opportunity costs due to late market entry). A similar setting has been 
studied recently by [39] and goes all the way back to [31]. 

Suppose that after T years the technology becomes obsolete and let M = {M t }t>o represent 
the profitability/value of the technology with state space E = {1, 2, 3} = {Low, Med, High}. The 
generator of M is 

f-2 2 
Q= -2 
\ 

Thus, M sequentially moves through the phases Low — > Med — > High. The firm may incorporate 
the feature at the minimal level (action d = 1), at the maximum level (d = 2), or not at all {d = 0). 
The profit functions are given by 

-1 3 4 




-4 2 10 



fee {1,2}, ieE, 



with zero profit when d = 0. 

The observation process X corresponds to competitor contract sales and is represented by a 
compound Poisson process with mark space Yj, £ B = {1,2} = {Large, Small}. The M- 
modulated intensity of X is A = [Ai,A2,As] = [3,5,3] and the mark distributions on B are 
[0.2, 0.8], [0.5, 0.5], [0.8, 0.2] respectively. Contracts signed by competitors are opportunity costs 
and the objective function is of the type (5.1) (with zero discounting p = 0): 



V{T, vf) = sup E n 



N T 2 

X>cs)+£ 1 {«*=*>(E WM -i 



{M T =i} 



k=l i&E 



where T = l,-ftT(l) = S,K{2) = -1. 

The triangular regions in Figure 6 are the state space D = {tt G : ttlow + ^Med + ^High = 1}- 
In the panels, we show how the stopping regions expand as the time to maturity approaches (from 
left to right) as indicated in Remark 4.2. When T = 1, (left panel) we see that if the DM stops, she 
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FIGURE 6. Value function V(T, 7?) of the technology adoption example 6.4 plotted together 
with the stopping regions (shaded: d — 2 lighter color, d = 1 darker, d = black). Left 
panel: T = 1, middle panel: T — 0.25, rig/it panel: T = 0.05. 



Socialist Socialist 




FIGURE 7. Stopping regions {7? G -D : V(T, 7?) = H(tt)} C IV 0/ £/ie targeting example of 
Section 6.5 for T — 2. On £/ie fe/f panel we illustrate the effect of the waiting cost c, with the 
shaded polyhedra representing stopping regions for c = —0.1, c = —0.2, c = —0.4 respectively. 
On the right panel we take c — —0.2, and we display the effect of changing the arrival rate 
from \l = 4 (blue/lighter stopping region) to \l = 10 (red/darker stopping region). 

either selects d = 1, or d = 2 if there is sufficient evidence that M is at Med or High respectively. 
For T = 1, the decision d = is never considered since the DM can wait for M to move to better 
states. Note that, if T is small (middle and right panels) and if M seems to be at Low state, the 
DM does not have enough time to wait for M to jump to a new state. By stopping immediately, 
she at least gets rid of the opportunity costs. 

Around the Med corner there is high competitor activity (A2 = 5), and this increases in the 
opportunity costs (given by K(-)). As a result the DM always stops, she does not wait for M to 
move to High state. Since the expected reward of minimal commitment is higher than that of 
maximum commitment around this corner, she selects d = 1. The DM selects d = 2 only if there 
is sufficient statistical evidence that the technology has reached its High benefit. 

6.5. A targeting problem. As a final illustration we present a targeting example, where the 
objective is to maximize the probability of M belonging to some favorable set B C E. 

An industrial conglomerate is seeking a business-favorable government legislation and employs 
a lobbyist for that purpose. The lobbyist maintains government contacts and will try to time her 
action to maximize the probability of the law passing. Suppose the passage of legislation depends 
on the current political climate Mt in the country that can be one of the following four states: 
E = {1,2,3,4} = {Libertarian, Conservative, Progressive, Socialist}. For simplicity we assume 
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that the law will pass if the climate is in B = {Libertarian, Progressive} and fail otherwise. 
Suppose that the generator of M is 

(-1 0.5 0.5 \ 
0.5 -1.5 0.5 0.5 
1 0.5 -2 0.5 
\0 1 0.5 -1.5/ 

We postulate that the objective function is E^fcr] + P ?r (M T G B), where the constant c < 
denotes the running cost of maintaining the lobby. Information is obtained via a simple Poisson 
process counting the passing of other business-friendly legislation, with M-modulated intensities 
A = [Al, Ac, Ap, As] = [4, 3, 2, 1]. The time horizon is T = 2 years. 

Figure 7 shows the stopping regions of this example inside the tetrahedron D. The left panel 
shows the effect of changing the waiting cost c; as c increases in absolute value, the DM is more 
"impatient" and will stop sooner, compare with Remark 4.5. The right panel of Figure 7 shows the 
effect of increasing A^ to Xl = 10. As intuition suggests, this shrinks the continuation region be- 
cause the data is now more informative. We see that the continuation region Ct expands especially 
around the 'Libertarian' corner, as the DM can now be fairly confident in detecting that regime 
(as it has a much higher arrival intensity). 

Appendix Al. Sample Paths of n 

In this appendix, we prove Lemma (2.1), and we derive the characterization of the sample paths 
given in (2.10-2.11). 

Proof of Lemma 2.1. Let 5 be a set of the form 

E = {N h =mi,...,N tk =m k ;(Yi,...,Y mk ) G B} 

where = to < ti < . . . < t k = t with < mi < . . . < m k for k G N, and B is a Borel set in 
B(R mk ). Since tj and m^'s are arbitrary, to prove (2.9) it is then sufficient to establish 

Lf(t,N t :(a k ,Y k ),i<N t y 



l s -P^Mi = 



1= • 



L-(t,N t :(a k ,Y k ),i<N t ) 
Conditioning on the path of M, the left-hand side (LHS) above equals 



LHS = E 71 " 



{M t =i}- 



ITf {N tl 



mi, . 



m k ;(Yi,...,Y mk ) G B 



M s ;s<t\ 



{M t =i} 



oi £ dsi, . . . , a mk e s m . ; Y\ G dyi, . . . , dY mk G dy. 



IN , k < t and s mj < tj < 

Srrij+l 

for j 



'SxT(ti,...,t fe ) 

where 

T(ti, ...,t k ) = {si, . . . , s mk G : si< ... < s 
Then, by Fubini's theorem we have 

p ~ m II Yl l {M sl =i} x jfj(yi) ds i "(dyi 



M s ;s<t 



k} 



LHS = E n 



{M t =i} 



BxT(ti,...,t fc ) 



1=1 jeE 



"Ik 



LJ(t,m k : (sj,yj),j < m k )T\dsi ■ v{dyi 

BxT(t ly ..,t h ) iJ[ 



Lf(t,m k : (sj,yj),j < m k ) 
BxT(ti,...,t fc ) L*(t,m k : {(jjiYj),] < m k ) 



L*(t,m k : (sj,yj),j < m k ) JJdsj • v(dy t 



i=i 
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Another application of Fubini's theorem gives LHS 



E 7 " 



E 



i&E 



Lf(t,m k : {sj,yj),j < m k ) 
BxT(t u ...,t k ) L^{t,m k : (<Tj,yj),j < m k ) 



in k 



m k 



e ~ m n e HM H =i\^fM < d vfi 



E" 



E 77 



1=1 



M s ; s<t 



1= • 



HNt 1 =mi,...,N tk =m k ;(Y 1 ,...,Y mk )eB} " ^ ^ . (^jr.Jj < J^) 



M s ; s <t 



L^t,N t :{<T j} Yi),j<Nt)\ ' 
and this concludes the proof. 



□ 



Proof of Remark 2.1. In order to establish (2.10-2.11), let E,-[-] denote the expectation operator 
E^f- 1 Mo = j], and let t m < t < t+u < t m+ \. Here t m and t m+ \ can be considered as the sample real- 
ization a m (u) and a m +i(uj) of the m'th and m+l'st arrival times respectively. Using the definition of 
Lf in (2.7) we have Lf (t+u, m : (t fc , j/ fc ), k < m) = ^ jeE vr.-E, [l {Mt+u=i} ■ e~ I( - t+ ^ ■ UT=l 



(A1.1) 



E 7 ^ 



k=i 
1 



M s : s < i 



■ e 



-(!(*+„)-/(*)) 



Using the Markov property of M, the last expression in (Al.l) can be written as 



M, : s < t 



e~ m [Y[e<t k ,y k )) - J2l{M t =l}-®l 



\k=l 



lEE 



{M u =i} 



■ e 



-/(«) 



• E 71 " 



l {Mt =/}-e- /(t) n^'^) 



fc=l 



J> [l { M u=l} e- /(u) 



• Lf(t,m : (cr k ,y k ),k < m). 



leE 



Then the explicit form of II in (2.9) implies that for a m < t < t + u < 0>n+l> we have 



(A1.2) Ui(t + u) 



E«e£; L f(^ m : (vk,yk),k <m)-Ei [l{ Mu =i} -e 



-I(u) 



T,jeEY,ieE L f(^ m: (°k,yk),k< m) -Ej [l {Mu =j}e J M] 
Z leE m) ■ ihM^e-W] [liM^e-W] _ IT {ax > u, M u = i} 



7f=n t 
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On the other hand, the expression in (2.7) gives 
(A1.3) 



Lf(a m+1 ,m + l: (a k , Y k ), k < m + 1) = E* 



m+l 



l {Mt=l} e- m l[£(t k ,y k ) 



k=l 



(tk=°'k,yk=Yk)k<m + l 



\ifi(Y m+1 )E" 



{Mt=i} f 



-J(t) 



2/fcj 



fe=l 



i=o" m +i 

(tk=CTk,yk=Yk)k<r, 



Observe that for fixed time t, we have Mt = Mt-, P^-a.s. and LJ(t,m : (t k ,y k ),k < m) = 
LJ(t—,m : (t k ,y k ), < m) when t m < t. Then we have 

L?(a m+ i,m+ 1 : (a k ,Y k ),k < m + 1) = Xifi(Y m+1 ) ■ Lf(cr ni+1 -,m : (a k ,Y k ), k < m), 

due to (A1.3). Hence, at arrival times a±, <J2, ... of X, the process II exhibits a jump behavior and 
satisfies the recursive relation Tli{o m+ i) = 



(A1.4) 



Xifi(Y m+1 )Lf(a m+1 -,m : (a k ,Y k ),i < m) 



(,<7m+l~ 



YljeE X j f j (Y m+1 )L?j((j m+1 -,m : (a k ,Y k ),k < m) Y^jeE ^jfj{Ym+l)^-j(^m+V 



for m G N. 

The identities in (A1.2) and (A1.4) give (2.10-2.11). By repeating (A1.1-A1.2) with m = 
(i.e., with no arrivals on [0,i + s]), we see that the paths t \— > x(t,n) have the semigroup property 
x{t + u, 7?) = x(u, x(t, tt)). □ 



Appendix A2. Supplementary Results and Other Proofs 

Proof of Proposition 3.1. The inequality V m (s,Tr) < V(s,tt) is immediate. To show the second 
inequality, let r be an F-stopping time less than s P-a.s.. Then we have 



(A2.1) E 77 



e- pt C(U t )dt + e- pT H [U T 



rT/\cr m . . 

J e~ pt C(TL t )dt + e~ prA ^H (n rACTm ) 



+ 1 {r>a m } 



e~ pt C(U t )dt + e~ pT H [U T \ - e' pCTm H ( IT, 



< E 71 " 



rA(7„ 



e-f* C(U t )dt + e - pTAam H {U TAa , 



+ 1 {r>a m }e 



-pcJn 



\\c\\ 



T—a r , 



- pt dt + e' p{T - am) H (u T ) - H ( n, 



< E 71 " 



rAa„ 



e- pt C(U t )dt + e - pTAam H {U TAa , 



+ (T||C|| + 2||#||)-E* [e-^™ 1 {T><TB 



where the last line follows since t < s <T and {r > a m } C {T > cr m }. Using the Cauchy-Schwarz 
inequality and the inequalities P^jT > a m } < K 77 [l^ T>(7m j(T / a m )] < T ■ E^fl/dm] we obtain 



(A2.2) E 77 



e- pt C(U t )dt + e- pT H (n T 



< E" 



tA(t„ 



e- pt C(U t )dt + e - pTAc7 ™H ( n rAff 



+ (r||C|| + 2||iJ-|| WTE*[l/<7 m ] E-[e- 2 ^-]. 
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Note that given M, we have [ax > t\M] = e _/ W, where /(■) is defined as in (2.8). This implies 



E 77 \e- uai \M] = E n 



u ■ e~ ut dt \M = /°° IT [a x <t\M]u- e~ ut dt 



o 



1 - e 



-i{t) 



u ■ e~ ut dt < 



o 



1 - e 



-At 



u ■ e~ ut dt 



A 



u + A 



The process X has independent increments conditioned on M. Then, the inequality E 71 " [e UUm \M] < 
x \ m 

— = follows by induction and we have 
u+XJ J 



(A2.3) 



[e 



< 



A 



u + A 



for all m G N. Moreover, since l/a m = J °° e amU du, the inequality in (A2.3) gives E 71 " [l/cx m ] < 
f™Qr/u + \) m du = A/(m - 1), for m > 2. By using this upper bound in (A2.2) and taking the 
supremum of both sides we obtain (3.3). □ 

Proof of Lemma 3.2. Boundedness and monotonicity are immediate by the definition of the 
operator J in (3.6). To establish the convexity, we will show that expression in (3.7) is convex (in 
7?) for each t and s. 

We first note that W [e' 1 ^] = Y.jeE^j [e~ I(t) ] and mi(t,jf) = EjeS 7r i E i [ l {M t =i} e ~ I{t) ] 
are linear in tt where m,i(t, 7?) is defined in (2.13) for i G E and Ej is the expectation op- 
erator E [• \Mq = j] for j G E. Then we see that the expression E 71 " [e - ^*)] e~ pt H (x(t, tt)) = 
maxfc e _4 e~ pt ^2 ieE fJ-k,i mi(t, tt) is convex as the upper envelope of convex functions. Next we let tt h- >• 
w(s, tt) be a convex mapping for each s > 0. Then we have w(s, tt) = sup keK Pk,oi s ) + h,i{ s )^i + 
. . n (s)7r n; for some index set K s , and each f3k,i{ s ) is a function in s. Using this characterization 
with the definition of the operator Si in (3.8) we obtain J* * e~ pu YlieE^ \^{M u =i} e ~ I<yU ^ 1 \ ' \Siw{s — 
u, x(u, Tf))du = f* e~ pu J2i£E m i( n ' - 



/ 



sup 



^jfj{y)mj(u,TT) 



jeE 







f e- pu 


/ sup 


Jo 


JM. d k£K s _ u \ 



j&E 



Y,ieE x iMy) m i( u ^ 

u) + Pk,o( s - u )\ ^jfjiv) mj(u, tt) ) v{dy) 



fi{y)v{dy) 



du 



du. 



Since the expression inside the supremum operator are linear in tt, the integrand in the inner integral 
is convex, and therefore so is the expression above. Also note that f Q * e~ pu Yli&E m i( u i k)C(x(u, Tr))du 
i^E c i m i{ u i Tr)du, where both the integrand and the integral are linear in tt. Finally, 
as the sum of three convex functions tt h-> Jw(t,s,Tr) is convex. Since Jow(s,tt) is the supremum 
of convex functions, it is again convex. □ 

Proof of Lemma 3.3. Let us define T T = {(f, s) G R 2 + : < t < s , s < T}. Then the mapping 



(t,s,n) ^ E 77 [e- J w] • e' pt ■ H (x(t, tt)) = [Y. jeE TTjE j [e- /(t) ]J e^* • H (x(t, tt)) is continuous on 

the compact set Tt X D due to bounded convergence theorem, the continuity of H(-), and regularity 
of paths 1 1 y x(t, tt). 

For a (bounded) continuous function w(-, ■) on [0, T] xD , the function Siiv(-, ■) is again continuous 
for i G E due to bounded convergence theorem. Next let (t m , s m , 7? m ) mg pj be a sequence converging 
to a point (t, s,7r) G Tt X D, and let us denote Fi(u,s,7r) = C(x(u, tt)) + XiSiw(s — u,x(u,tt)) for 
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typographical convenience. Then 



ft r , rt m 



l {Mu=l} e- Iiu) 



Fi(u,S m ,TT m )du 



< 



-pu 



+ 







s, 7r) du 

e -P« . £ ( E * l {Mu=l}e -^)] Fi (u, s, vr) - E*« 



{M u =i}< 



-/(«) 



, 7T m ) I du 



-pu 



-pu 



i&E 



Note that as m — > oo, the second integrand above goes to 0, and the whole expression vanishes 
due to dominated convergence theorem. Hence, we conclude that Jw(t,s,ir) in (3.7) is continuous 
on Tx x D. Since this last set is compact, it follows that Jw(t,s,ir) is uniformly continuous and 
(s,tt) i — y Jqw(s,tt) = sup i<s Jow(t.s,n) is continuous on [0,T] x D. □ 



To prove Proposition 3.2, we first establish the following intermediate result. 



Proposition A2.1. For every £ > 0, let us define 



(A2.4) r e m (s,n) = inf{t G [0,s] : Jv m (t,s,n) > J v m (s,n) -e}, n £ D, 



Sei ->\ A ef ->\ , j ri£ / -*\ A \ I'm (s, 71") 

i(s,vr) = r (s,7r) Acti and S" m+1 (s, vr) = <^ /2 ^ e/2 

lcJl + 6m (S - 0-i,il CT J %fO\<r m [S,TT 



if o-i > r e rl 2 (s,Tr), 



Then, for every m > 1 we have 



(A2.5) 



y e -^c(n t )dt + e-^™( s ^# (n 



> u m (s,7r) - s. 



Proof. We will prove (A2.5) by an induction on m 6 N. For to = 1, thanks to (3.6) and (A2.4) 
the left-hand-side of (A2.5) equals E J^i^o)^ e - P t c ^ dt + e -p-r§( s ,n )A<T li j (n rg(s>fio)A(7i 
J H (r^s , n) , s , tt) = Jt>o(ro(s, 7r), s, 7?) > wi(s,7r) — e, which proves (A2.5) for to = 1. 
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Now, let us suppose (A2.5) holds for e > 0, and for some m > 1, and let us prove that it also 
holds when m is replaced by m + 1. Since S^ n+1 (s, n) A <7i = r^(s, rio) A <7i, we have 



IE 71 " 



E" 



S^, + l(s,if)Ao'l , 

■S^ +1 (s,7?)Acri 



+1 {s;+iM>^i) 



(s,n )A<Ji 



..ff n 



{r^^(s,rio)<ffi}" r>m 2 (s,n ); 1 "{rS^(s,llo)>ai} 

o"i+'Sm 2/2 (s-o-i,H CTl ) 



fi+Sm 2 ' 2 (s— <y\ ,n CT1 ) 



E 71 " 



e /2 ~* 

' Hi 



e- pt C(U t )dt + 1 



{r^ /2 ( s ,n )<^} 



inn 



+1 



(s,n ); 1 "{r^ /2 ( s ,n )> CT i} 



where the last line follows from the strong Markov property and where 

'e- pt C{Tit)dt + e -ps^ 2 (u,n) . R /jj 



/ m (n,7f)=E 7r 



Sm 2 (u,Tr) 



> v m (u,7r) - e/2. 



The inequality above follows from the induction hypothesis. Then we obtain 



Jo 



r^ /2 (s,fio)A(Ti 



+ 1 {r^ 2 ( S ,n )«7i} ii7r i n r^ /2 ( S ,n )J + l {rU 2 {s,U )>a 1 } 



e- pt C(U t )dt 



= J%(rm (f), s, 7?) — | > t)m+i(jf)- e. Here the equality follows from the definition of the operator 
J in (3.6) and the second equality follows from (A2.4). This concludes the proof of (A2.5). □ 

Proof of Proposition 3.2. The inequality V m > v m follows from (A2.5) since S^s, tt ) < s A a m 
by construction. To prove the reverse inequality V m < v m we will show 

-TAcTm 



(A2.6) 



E 



e- pt C(U t )dt + e - p - Th(Tm ■ H(U tA<7 , 



< v m (s,Tr), 



for every bounded stopping time r < s and m G N, by showing 



(A2.7) E 



tAo>] 



rAa„ 



< E 



1"A(T m _ fe + 1 



e _pt C7(nt)dt 



=: RHSk-i, 



for fc = 1, • • • , m + 1. The inequality (A2.6) will then follow from (A2.7) by taking k = m + 1. For 
fc = 1, (A2.7) is satisfied as an equality since vo(s, •) = H(-), for all s G [0, T]. Now, let us assume 
(A2.7) holds for some 1 < k < m+ 1, and let us prove that it also holds for k+ 1. 
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Note that RHSk-i in (A2.7) can be written as RHS k -i = RHS^\ + RHSj?2i, in terms of 

l"TA<T m -k 

'o 



RHS ( f}\ = E 



RHS^ = E 



rAo- m _fe + i 

Ur «„. I / e^C(U t )dt 

^m — k 



1 



{T>0- m _fe + l} ( 



Lemma 3.1 implies that there exists an T^, -measurable random variable R m -k such that 

t A <r m _ fc +i = (a m -k + Rm-k) A <7 m _fc+i on {r > cr m _fc}. 
Moreover since r < s, we have R m -k < s — <r m _fc on {r > a m ^k}- Then we obtain RHSj t _ 1 

<^m — fc + 1 



E 



{l">(T m -fc}' 



e -^c(n t )(it+i {T > (Tm _ fc+l}e -^- fc + i ^_ 1 (« - <r m _ fe+1 ,n ff; 

k<&m-k + lj \ a m-k+tim-k 



Due to strong Markov property, the last expression can be written as 
(A2.8) 



RHS { k 2 \ = E 



l{r>(7 m _ fe } • e P ' Um ~ k 9k-\ [Rm-k, S - a m - k , IT (<7 m _fc) 



where c/ fe _i(r, «, ff) 

Thai 



E 



o 



e-^CCnOdt + l {r<(Tl} e^ r # n (r) + l^^e^Vk-i (u - <7i, II (<n) 



for r < u. Then, using the definition of the operator J in (3.6) we have 

gk-i(r,u,n) = Jv k -i{r, u, ff) < J Vk-i(u,n) = v k (u,Tr). 

As a result, we obtain RHS^ k 2 } 1 < E l{r>o- m _ fc } e ~ p ' CTm " few fc (tt — cr m _fc, IT (<n) 
implies 



, and this further 



(A2.9) E 







< RHSt-x = E 

/•TAcr m _fc 

'o 



TA(T m _ fe 





e-^C7(n t )dt + l {r<CTm _ fc} e-"- • H(U T ) 



< E 



e-^C(fl t )dt + l {T<ffm _ fc}e ^^(n r ) + l {T > CTm _ fc} • e- 



01 



Since the last term equals RHSk, this completes the proof of (A2.7) by induction. Equation (A2.6) 
follows when we set k = m + 1. Finally, taking the infimum of both sides in (A2.6), we arrive at 
the desired inequality V m < v m . □ 
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Proof of Lemma 3.5. Using the definition of the operator J in (3.6) we obtain 

"tA<T 1 



Jw(t,s,n) = E n 

ruAcri 

E* ' 



/ e~ pt C(n t ) dt + l {t<ai} ■ e-P'HiUt) + l {ai < t} ■ e^w^s - ax,U ai ) 
Jo 



e-^C(n t ) dt + / e-^C(n t ) dt - l {u<Ul} ■ e-P u H(U u ) + l {u<CTl} • e - pu H(U u ) 
+ l {t<ai} ■ e- pt H{U t ) + l {ai < u} e- pai w(s - a^U^) + l {u<ai < t} ■ e~ w u;(s - ffi.H^) 
= Ju;(u, s, vf) + E* [-l Wl>u} ■ e-" u H{YL u ) + l {(Jl>n} ( J e~ pt C(Jl t ) dt 

+ 1{<x 1>u } [}{<n>t) ■ e- pt H(ti t ) + l {ffl < 4} • e-P** ■ w(s - auU^ 

On {ffi > u}, we have o\ A t = u + (<7i A (t — u)) o 9 u . Then the Markov property of II gives 
Jw(t, s, tt) = Jw(u, s, tt) - F*{ai > u}e~ pu H(x(u, tt)) 



{a 1 >u} t 



-pu JgHu 



t—u 



e- pt C(U t )dt + l {ai>t _ u} e- p ^H(n t _ u ) 



+ l Wi<{t-u)}^ pai w(s -u- ai,U ai ) 
Jw(u, s, tt) - W{ax > u}e- pu H(x(u, tt)) + W \l {(71>u} ■ e- pu Jw(t -u,s-u,U u 
Jw(u, s, tt) + ¥ T {ai > u}e~ pu [Jw(t -u,s-u, x(u, tt)) - H(x(u, tt))] . 



□ 



Proof of Lemma 4-1- Let e*j G D denote the point whose i'th component is equal to 1. To 
establish the result it is sufficient to find a closed ball with strictly positive radius around e, (e.g., 
a region of the form {tt £ D : \ \tt — e*j 1 1 < 5} for some 5 > 0, where 1 1 • 1 1 denotes the Euclidian norm 
on M. n ) such that H(tt) < vi(s,tt) < V(s,tt) for all points on this closed ball. 

We first note that there exists a closed ball Bq around ei with positive radius such that H{tt) = 
max fcg _4*(j) Hk(n), for tt £ Bq. Then on Bq and for small s > we have v\ (s, tt) = sup t<s JoH(t, s, tt) - 

max fcg _4* w sup te[0jS] J^ ] H{t,Tt), where J^'H(t,TT) = 

ft 



E 71 " 



-i(t) 



e- pt H k (x(t,Tr))+ / e" pu ^m i (u,^)(a(^(u,^)) + A i ^if(x(«,7f)))dw. 



Then, using (2.14) we have dJ^ H(t,Tr)/dt\ t=Q 



\ ie-E / je£ ies / j&E 

~P- y] Aj7Tj flfc(7f) + ^ /UfcJ ^ %j7T/ - Aj7Tj + 71^ ^ A^ + C (n) + ^ XjTTjSjH^Tx). 
j&E J j&E \leE leE / j&E 

The right hand side of the inequality above is uniformly continuous on the compact set D. Its 
value at the point e*j equals q — ppk,i + ^2j^iiPk,j — f^k,i)Qk,j > 0. Hence for some 5k > there 

exists an open ball (contained in Bo) with radius 5k around e*j such that dJ^ H(t,Tr)/dt\ t=0 > 
for all the points in this ball. Let Bk be the closed ball around the same point e?j with radius 5k/ 2. 
Then on the intersection set C\k&A*{i) ^ k ma PPi n g tt h-> dJ^ H(t,Tr)/dt\ t=0 is strictly positive 
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and sup 4>0 Jq H(t, tt) > H k (t,it) for all k G A*(i). This implies that vi(s,tt) > H(tt) for all s > 
011 n fce ,4*(i)£fc- □ 



Proof of Lemma 4-2. Let i G I* for I* defined in (4.14). To establish the result, we will find 
7r| < 1 such that H(tt) = Jow(s,tt) on {(s,tt) G [0, T] x D : irf < 7Tj < 1} for a bounded function 
w(-) < \\H\\ =~p = maxj p, ifk . Since V is bounded by the same upper bound (recall that q < 
for i G E by assumption) and satisfies V(s, tt) = JqV(s, tt) we will have H (•) = V(-) on this region. 
Part I: Let us first define 



(A2.10) F k (t,Tf) =¥? 



-i(t)~ P t 



H k (x(t,rr))+ / e-^J2 



C(x(t,Tv)) + XjJJ 



du. 



Since H(tt) < J w(s, 7?) = sup te[0 s ] Jw(t, s, tt) < sup t6[0 s ] m&x keA F k (t, tt) = max fcj4 sup t£[0 s ] F k (t, tt) 
(see (3.7)), it is enough to show that for some vr| < 1 we have sup t>0 F k (t, tt) = H k (rr) for all k G A. 

Let TTi < 1 be a value such that H(tt) = max^g^. h k (Tr), where A* = {k G A : p k ,i = p}- That 
is, we have p k ^ = ~pZ for all k £ A* (and i G I*). Note that 7Tj can for instance be selected as 



/i - mm tiJ p kJ 
kgA* 2JL — minfcj /Ufej - a&,.. 



TTi = max 



Let us then define the hitting time T{rf, TTi) — inf {i > : Xi(t, tt) < TTi}. For t < T(tt, TTi), we have 
maxfcg^lffc (x(t,Tr)) = max fce ^» H k (x(t, tt)), which implies max fee ^ F k (t, tt) = max fce ^* F k (t, tt). 
Note that we have 



(A2.ll) 

dF k {t,Tr) 
dt 



-(A; + p) • tf fc (x (t, tt)) + dHk ^^ + C (x(t, tt)) + Ai||H| 



where 



(A2.12) 



dH k (x(t, tt)) 
dt 



^Pk,i ( y^qjiXj(t,7T) - \jXi(t,TT) + Xi(t,7T)y^\jXj(t,7T) 



due to (2.14). Let us denote p = min^j p k ,i- For k G ^4*, we have H k (x(t,Tr)) = pxi(t,Tr) + 
Yli^i ^k,i x i{ti > ~fiXi(t, tt) + p{l — Xi{t, tt)). Using this inequality, we get an upper bound for the 
derivative in (A2.ll) as 



(A2.13) 
dF k (t,Tf) 
dt 



-i(t)- P t 



A(/i - jX) - PH){1 - Xi{t,TT)) - pfjkXi(t,7c) + CiXi(t,Tf) + 



dH k (x{t,Tf)) 
dt 
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where A = maxjgE Aj. Moreover, using (A2.12) it can be shown that for k G A* we have 



(A2.14) 

dH k (x(t,ir)) 
dt 



^2xj(t,Tf)^2fj, k> iqji - ^2fj, k ,ikxi(t,Tf) + Hk,ixi(t, 7r) y XjXjit,^) 



l£E 



l&E 



l£E 



< np (max j -Xi(t,7r)) - ^ p kjl XiXi(t, tt) + /^(t, 7?) ^ XjXj(t, tt 
^ ' 3 ' ifr j^k 



+ XiXi(t,Tf) ^2fJ.k,lXl(t,TT) + y~]fl k ,lXl(t,w) y~]\jXj(t,tf) 



< ^1 — x,(t, 7r)^ • f 3 • /z • A + n • ^max |gy |J • p 

where the second line follows from the inequality YlieE A*>fc,z 9zz — (recall that p = p k ^ = max^j /ijy 
and = — Ylijki la)- The equations (A2.13) and (A2.14) then imply that for t < T(tt, ni), and for 



(A2.15) 



dF k (t,Tr) 
dt 



p/IXiit, 7?) + CiXi(t, TT ) + ( 1 - Xi(t, TT) ) ■ G } . 



where G = 4 • p, ■ A + n • (max/j |gy |) • p, — (p + A) • p. Note that the assumption 1 p > or a > 0' in 
Lemma 4.2 assures that dF k (t,vp)/dt\ t _ is negative as tti — > 1. Therefore, if we define 



7Tj = max < 7Tj , 



G 



pp- Ci + G 



max < 7Tj , 



4//A + n (max; j \qij\)fi- (p + X)p 
-Ci + np (max; j |^|) + 3A/J + (/Z- p)(p + A) 



< 1, 



we have dF k (t,Tr)/dt < on t G [0, T(tt, 7r n )] for all i £ i* and for all tt such that 7r, > 7Tj. This 
implies that JH(t, s, tt) < H(tt) on this region. 

Part II: Next, let T(tt, %{) be the hitting time of the deterministic path Xi(t, tt) to the level TTj. 
Below we show that there exists irf such that 



(A2.16) 



F k (t,T?) <E* 



tAcri 



</I< + m(l-7r|) <fT(7f) 



for all k G .4 (not just „4*) and for all f > T(tt, TTj) on the region {-7? G D; 7Tj > 7r|}. This will 
further imply that JH(t, s, tt) < H(tt) for all t > for a point tt falling on the latter region, and 
we will have H(tt) < JqH(s,tt) = sup t6 r 0)S i JH(t,s,n) < H{tt). 

Note that the first inequality in (A2.16) follows from C(-) < c and H(-) < pZ. For a given value 
irf the last inequality is true for all the points on {tt G D; 7Tj > 7r|} since 

#(7?) = SUp Hfc(7?) = jffiTTi + SUp Pk,i^i > pKi + m(l - 7Tj) > /X7rf + m(l - 7T?). 



fceA 



Hence it remains to show that the second inequality holds for some 7r|. 

For 7Tj > 7Tj we have i\i = %i + j^^^^ ^'j*' 71 "^ (it. Then, thanks to (2.14) we get > i\i — Hi 



T(TT,TTi) 



^ qjiXj (t, 7?) - Aja^t, tt) + a?j(t, 7?) XjXj (t, 7?) J > 



r(7?,7fi) 



A,- dt 
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Qu — \ ) m T(T?,TTi), which further implies 
(A2.17) T(ir, h) > (ir< - h)R-q» ~ A»). 

Case I: p > 0. By (A2.17) we get the inequality E 77 exp f-p • T(7r,#j) Aaj) < 



E 71 " exp -p 



— TTi 

- —Qu + Aj 



A cri 



exp -p 



VTi — TTi 



-—Qu + Aj 



{M u =i} 



du. 



The last expression above is strictly decreasing in TTi and equals 1 at 7Tj = fci . Moreover the mapping 
Hi \-¥ /Z7Ti+/x(l — TTi) is increasing and equals pZ at Wi = 1. Therefore there exists a unique 7r| 6 [7Tj, 1) 
defined as 



(A2.18) ?r| = inf ^ tt; > t\ { : pE 71 exp -p 



TTi — TTi 



A <Jl 



< pTTi + /i(l - TTi)} < 1, 



such that the inequality in (A2.18) holds for all 7Tj £ [7r|, 1]. The definition of 7r| implies that for 
all the points 7r with TTi > tt| and for t > T(7r, 7Tj) we have 



E 



o 



< pE 



-pT(n,Tti)Aai 



JiE 



-pT(n,ni)Acri 



< pE^exp -p 



TTi — TTi 



-Qii + Aj 



A (Tl 



< /i7Tj + - 7Tj) < H(tt). 



This establishes (A2.16) and concludes the proof when p > 0. 

Case II: c > 0. If p > 0, arguments given for Case I still holds. Hence we assume that p = 0. 
Using (A2.17) again, we obtain 



E 77 



T(7T, TTi) A (71 



> E 71 " 



TTi — VTi 



+ A, 



A (7i 



itl I "»l 



TTj — 7Ti 
— <7tt + Aj 



The last expression above equals to at TTi = fti and it is strictly increasing in TTi for TTi > 
Therefore there exists a unique point 



tt| = inf I TTi > TTi : -cE w 



TTi — TTi 



A, 



A (71 



+ P < P^i + M(l - VTj) > < 1, 



Then for the points tt with 7Tj > 7r* and for t > T(7r, TTi) we have 



E 



cdt + p 



o 



cE[f Affl] +iU< CE T{tT, TTi) A (7 1 

< cE 77 



VT,; — 7T,; 



+ A S - 



A (Tl 



+ P< pTTi+ p(l - TTi) < H(tt), 



and this concludes the proof. 



□ 



Proof of Lemma 4-3- The first inequality in (4.18) is obvious. To show the second inequality let 
r be an E-stopping time. Then, we have 

(A2.19) E 7 " J e- pt k(U t )dt + e- pT H (u T ^j < E 77 J e- pt C(U t )dt + e - prAT H (LT tAT 



+ E 7r 



hr>T} ( / e- pt C(U t )dt + e- pT H(U T ) - e - pT H(U T 
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If p > 0, the last expectation above is bounded above by e -pT (||C|| + 2 • ||.ff||). Then taking the 
supremumover all r's on both sides we obtain (4.18). 

On the other hand, if p = and max^gE a < 0, we may safely restrict ourselves to the set 
of stopping times r for which E[r] < (min^//^, — maxkj (ik,i) / maxjgECj: the expected reward 
associated with any stopping time having a higher expected value is dominated by the reward 
achieved upon stopping immediately. Then, the second expectation in (A2.19) is bounded above 
by 

2 • llff II • P{r > T} < 2 • ||i7|l ffl < 2 ' ^ m[UkA ^ ~ maXM ^ 



T 



T 



maxjge a 

thanks to Markov's inequality. Then, the inequality in (4.18) follows after taking the supremums 
over t again. □ 

Proof of (4.21). Let Ue denote the stopping rule in (4.20) for notational convenience. Since 
jj( m ) /\ j 1 < JJe" 1 ^ < Uo(oo,n), the arguments of [12, Proposition 3.11 and Section 4.1] give 



V(T,tt) < V(oo,tt) = E* 



On the event {U^ m) < T}, we use the inequality V I oo, II 
P^-a.s., to obtain 

V(T, vr) < E ; 



r U (m) AT 

J 1 e-^C7(n t )dt + e-^ (m)AT V(oo,n 



l/i m) AT 



(./, 



(m) 



e — Err ^(m) < H I 14 



O) , 



C/i m) AT 



+ e + Err^m) 



+ l r (m)r ^- mi e 



-pT 



y(oo,n T ) -H[n uim) 



< E" 



'l/i m) AT 



{^ mj (oo i7 f)>T} 

/>l/ (m) AT 

+ e + Err^m) + e~ pT Err^ (0) P{[/ e (m) > T}. 

If p > 0, we obtain (4.21) by removing the last probability. Otherwise we can use Markov's 

inequality P{[/j m) > T} < E[t/j m) /T] < E[U°]/T < max fcji p, ki i/[(mm ieE <h)T\ as in the proof of 
Lemma 4.18, and (4.21) follows. □ 

Proof of Remark 6.1. The first claim on immediate stopping if 1^2 1/^1,2(^2 — Ai) < p% \ + /Ui 2 
is an immediate corollary of [32, Theorem 2.1]. 

Let us now assume that ^2,1/^1,2^2 — Ai) > ^2,1 + A*i,2- For the problem with two hypotheses, we 
have H(tt) = minj^i^^ ; /i2,i 7r i} 5 and recall that v\(T,tt) = inf te ro )S i JH(t,n). For 7? = (7Ti,7r 2 ) 
with 7T2 € (Ai/i2,i/(A2^i,2+Ai/i2,i) > ^2,1/(^2,1+^1,2)) and for small £ > 0, evaluating the expression 
JH(t, 7?) gives 



TTie~ Xlt + ir 2 e~ X2t 



Mi,2X2(t, k)+ nie Al " ( 1 + Aj ( M2,l 

J ° 3=1 



\\Xl(u, 7f) 



AiXi(m, 7?) + A 2 x 2 (n, fr" 



and using the dynamics of 1 1— )■ x(t,ir) in (2.14) we obtain 



(A2.20) 



d Jif(t, vf) 
dt 



[1 + M2,iAi] • 7r ie - Alt + [1 - mi, 2 A 2 ] • 7r 2 e- X2t . 
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With t = and 7? = (^1,2/(^2,1 + 1*1,2) + <5,/^2, 1/(^2,1 + ^1,2) — <5), for 5 > small, the derivative 
becomes 



dJH(t,7r) 



[f*2,l + 1*1,2 + /"2,lA*l,2(Al - A2)] + $(l*2,l Al + 1*1,2^2)- 



dt t=0,n=(-,-) /U 2 ,l + /il,2 

Under the assumption ^2,i//i )2 (A2 — Ai) > /i2,i+/ii,2i the last expression is negative for 5 sufficiently 
small. This implies that v\{T,tt) < H(tt) for small values of T > at points tt, for which 7T2 = 
V>2,i / (t*2,i + ^1,2) - <5 where 

^ j»2,l/il,2(A 2 - Ai) - H2,l ~ 1*1,2 
(f*2,i + 1*1,2) (a«2,iAi + /Ui , 2 A 2 ) ' 

Since b\ (0) = fJ>2,i / (f*2,i + ^1,2), it follows that the boundary curve T i-> &i(T) is discontinuous at 
T = (see the lower curve in Figure 3). 

The expression in (A2.20) with t = indicates that dJH(t,n)/dt\t=Q is decreasing in 7r 2 and 
vanishes at the point 7? with 

l + M2,iAi M2,i 

7T2 = : r - - 



M2,lAl + fll,2^2 M2,l+Ail,2 

where the inequality is due to the assumption ^2,1/^1,2 (A2 — Ai) > /U 2) i + /xi )2 . This implies that 
(T,tt) : tt 2 < ^ and V5.(T,7r) = #(7?)) C ((T,tt) : tt 2 < 1 + ^ 2 ' lAl 



t*2,l+t*l,2 J I M2,lAl + 1*1,2^2 _ 

At the point 7? with 7r 2 = (1 + /X2,iAi)/(/X2,iAi + ^i j2 A 2 ) the expression for dJH(t,ir)/dt in (A2.20) 
is strictly positive for small t > 0. Then, we can find a value of u > such that 

t*(T,*) = *(*), for Tf = ( ff 2 1 . + ^ lAl . ) and T G [0,u]. 

VA*2,iAi + /Ui i2 A 2 /i 2 ,iAi + /ii, 2 A 2 y 

This further implies 

i»i(T,7f) =if(7f) on |(T,7f) : T G [0,n] and 7T 2 < 1+ ^ 2 > lA i 



A^.iAi + ^i,2A 2 

since the region {7? G Z? : V(T, 7?) = ZZ(7r)} is convex for each T (see Remark 4.3), and we have 
vi(T,tt) = H(tt), for all T > at tt = (1,0). Recall that the deterministic part t *— > x(t,ir) drifts 
towards the point (1,0). Then, by induction we conclude that v n (T, 7?) = H(jf) for all n G N, which 
implies that limn^oo v n (T, 7?) = V(T, tt) = H(tt) on the same region. 

As a result, we see that if the solution of the problem is not trivial, the lower boundary curve 
b\{T) is discontinuous at T = 0, and there is an initial region over which the curve stays flat at 
level 7r 2 = (1 + M2,iAi)/(/i2,iAi + /ii, 2 A 2 ) as in Figure 3. □ 
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